THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


“ Contents 


Testing Compound Symmetry in a Normal Multivariate Distribu- 
tion. D.F. Votraw, Jr 


Branching Processes. T. E. Harris 


Most Powerful Tests of Composite Hypotheses. I. Normal Dis- 
tributions. E. L. LEHMANN anp C. STEIN 


Symbolic Matrix Derivatives. Paun S. Dwyer anp M. 8S. Mac- 


On the Limiting Distributions of Estimates Based on Samples from 
Finite Universes. Witut1am G. Mapow 


A Non-Parametric Test of Independence. Wassity Horrrpine.. 546 
On Prediction in Stationary Time Series. Herman O. A. Wotp.. 558 


Generalization to N Dimensions of Inequalities of the Tchebycheff 
Type. Burton H. Camp 


Boundaries of Minimum Size in Binomial Sampling. R. L. 
PLACKETT 


Notes: 
Non-Parametric Tolerance Limits. R. B. Murray 
The Fourth Degree Exponential Distribution Function. Lzo A. 
AROIAN 
An Approximation to the Binomial Summation. 


Abstracts of Papers 

News and Notices 

Adoption of the New Constitution 

Report on the Madison Meeting of the Institute 


Si eetiiitaeicia nee 


Vol. XIX, No. 4 — December, 1948 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
S.S. WILKS, Editor 
M. 8. BARTLETT HARALD CRAMER 
WILLIAM G. COCHRAN W. EDWARDS DEMING 
ALLEN T. CRAIG J. L. DOOB 
C. C. CRAIG W. FELLER 
HAROLD HOTELLING 


WITH THE COOPERATION OF 


T. W. ANDERSON, JR. CHURCHILL EISENHART 
Davip BLACKWELL M. A. GrrsHIcK 


J.H. Curtiss Pau. R. Hatmos FREDERICK Mosreizr ~ f 


J. F. Daty Pau G. Hoe, H. E. Rossins 

Harowp F. DopGs Henry Scuerré 

Paut 8S. DwrrEr 3 JacoB WoLFOwITz 
WILLIAM G. Mapow 


The Annas or Marnematicat Statistics is published quarterly by the 


Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, § 


Md. Subscriptions, renewals, orders for back numbers and other tutions com- | 
munications should be sent to the ANNALS oF Maruematicat Sratisrics, Mt. 7 
Royal & Guilford Aves, Baltimore 2, Md., or to the Secretary of the Insti-” 
tute of Mathematical Statistics, P. S. Dwyer, 116 Rackham Hall, University of © 


Michigan, Ann Arbor, Mich. 


Changes in mailing address which are to become effective for a given | 
issue should be reported to the Secretary on or before the 15th of the 7 
month preceding the month of that issue. The months of issue are March, 7 


June, September and December. 


Manuscripts for publication in the ANNALS oF MATHEMATICAL STATISTICS | 
should be sent to S. 8. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts © 
should be typewritten double-spaced with wide margins, and the original copy | 
should be submitted. Footnotes should be reduced to a minimum and whenever © 
possible replaced by a bibliography at the end of the paper; formulae in foot- © 
notes should be avoided. Figures, charts, and diagrams should be drawn on = 
plain white paper or tracing cloth in black India ink twice the size they are to § 
be printed. Authors are requested to keep in mind typographical difficulties 7 


of complicated mathematical formulae. 


it 


Authors will ordinarily receive only galley proofs. Fifty reprints without ~ 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $8.00 inside the Western Hemi- } 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available 
at $8.00 per volume or $3.00 per single issue. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Ba.tTimorp, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879 








oe] 





TESTING COMPOUND SYMMETRY IN A NORMAL 
MULTIVARIATE DISTRIBUTION 


By Davin F. Voraw, Jr. 


Yale University 


Summary. In this paper test criteria are developed for testing hypotheses 
of “compound symmetry” in a normal multivariate population of ¢ variates 
(t > 3) on basis of samples. A feature common to the twelve hypotheses con- 
sidered is that the set of ¢ variates is partitioned into mutually exclusive subsets 
of variates. In regard to the partitioning, the twelve hypotheses can be divided 
into two contrasting but very similar types, and the six in one type can be paired 
off in a natural way with the six in the other type. Three of the hypotheses 
within a given type are associated with the case of a single sample and moreover 
are simple modifications of one another; the remaining three are direct extensions 
of the first three, respectively, to the case of k samples (k > 2). The gist of any 
of the hypotheses is indicated in the following statement of one, denoted by 
H,(mvc): within each subset of variates the means are equal, the variances are equal 
and the covariances are equal and between any two distinct subsets the covariances 
are equal. 

The twelve sample criteria for testing the hypotheses are developed by the 
Neyman-Pearson likelihood-ratio method. The following results are obtained 
for each criterion (assuming that the respective null hypotheses are true) for 
any admissible partition of the ¢ variates into subsets and for any sample size, 
N, for which the criterion’s distribution exists: (i) the exact moments; (ii) an 
identification of the exact distribution as the distribution of a product of inde- 
pendent beta variates; (iii) the approximate distribution for large N. Exact 
distributions of the single-sample criteria are given explicitly for special values 
of t and special partitionings. 

Certain psychometric and medical research problems in which hypotheses of 
compound symmetry are relevant are discussed in section 1. Sections 2-6 give 
statements of the hypotheses and an illustration, for H,(muvc), of the technique 
of obtaining the moments and identifying the distributions. Results for the 
other criteria are given in sections 7-8. Illustrative examples showing appli- 
cations of the results are given in section 9. 


1. Introduction. In studying psychological examinations, or other measuring 
devices, one may wish to test whether several forms of an examination may be 
used interchangeably. Consider the case of three forms, and assume that 
scores of individuals on the three forms have a normal 3-variate distribution. 
The hypothesis of interchangeability is equivalent to the hypothesis that in the 
normal distribution the means are equal, the variances are equal, and the covari- 
ances are equal. When this hypothesis is true, the normal distribution is in- 
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variant over all permutations of the variates and is said to have complete sym- 
metry. Itis frequently more important, however, not only to test that the forms 
have completely symmetric relations with each other but also that they are inter- 
changeable with regard to their relation to some outside criterion measure (e.g., 
the criterion might be skill in a given task). Assuming that the scores of in- 
dividuals on the three forms and the criterion have a normal 4-variate distribu- 
tion, the hypothesis of interchangeability is equivalent to the hypothesis of 
equality of means, equality of variances, and equality of covariances among the 
three forms and equality of covariances between forms and criterion. When 
this hypothesis is true, the 4-variate normal distribution is invariant over all 
permutations of the three variates (associated with forms) among themselves, 
and so the variance-covariance matrix has the following form: 





where the quantity A represents the variance of the criterion measure. A 
normal distribution for which this hypothesis is true is said to have compound 
symmetry (of type I). A more general case of compound symmetry (of type I) 
arises when there are several examinations (no two of which need have the same 
number of forms) and several outside criteria. 

The hypothesis of complete symmetry may arise in certain medical-research 
problems. For example, suppose a measurement (e.g., %COz in blood) is made 
at each of three times (say 7, , T2 , T;) on an experimental animal and assume 
that the three quantities have a normal 3-variate distribution; one may then be 
interested in testing the hypothesis of complete symmetry on basis of measure- 
ments (considered as a random sample) made on several experimental animals. 
More generally, let there be two characteristics, say U and W (e.g., %COz in 
blood and %O, in blood), which are both measured at each of two times, 71, 
T,. Let it be assumed that the four quantities—which we represent as UT, 
UT., WT,, WT.—have a normal 4-variate distribution. One may then be 
interested in testing the hypothesis that the means of the first two variates are 
equal, the means of the second two are equal, and the variance-covariance matrix 
has the form: 
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When this hypothesis is true, the 4-variate distribution is said to have compound 
symmetry (of type II). A more general case of compound symmetry (of type IT) 


arises when there are A characteristics and n times (h, n = 2, 3, ---). 
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Either of the two types of compound symmetry is a direct extension of complete 
symmetry. Wilks [5] has thoroughly treated the sampling theory of certain 
criteria for testing various hypotheses of complete symmetry regarding a normal 
multivariate distribution. 

The problems dealt with in this paper are: (i) to give sample criteria for 
testing hypotheses of compound symmetry regarding a normal multivariate 
distribution, and (ii) to give the moments and identify the distribution of each 
sample criterion when the corresponding hypothesis is true. 

The hypotheses are stated in section 2. Certain properties of compound sym- 
metric normal distributions are given in section 3. Sections 4, 5, and 6 together 
give the method of deriving each sample criterion and the methods of obtaining 
the criterion’s moments and identifying its distribution; the methods are illus- 
trated for one of the hypotheses. Sections 7-8 give the other criteria and their 
moments together with approximate distributions of the criteria for large sample 
sizes. Exact distributions of some of the criteria are given in section 7g for 
certain special compound symmetries. Section 9 contains two illustrative 
examples. 


2. Statements of hypotheses. Let II be a normal /-variate population and 
X; (i = 1, --- , t) @ > 8) be the 7-th variate in II. Let, the set of variates X, , 
X2,--:, Xz be partitioned into q mutually exclusive subsets of which, say, 
b subsets contain exactly one variate each and the remaining gq — b = h subsets 
(where h > 1) contain M1, NM2,°**, Mm, Variates, respectively, where nz > 2 


(a=1,---,h;b + Dm =2). No generality is lost in assuming that the ¢ 
a=1 


variates are ordered so that the first b belong to the b subsets containing one 
variate each, the next n; variates belong to the (b + 1)-th subset, --- , the last n, 
variates to the g-th subset, where nm. < m2 < +--+ < mm. Let (1°, mi, M2, °°, Ma) 
represent such a partition of the variates X,, --- , X; into subsets; when b = 0 
the term 1° will be omitted. The notation can be simplified when m, m2, --- 
m, are not all unequal; e.g., (1°, 2, 2) can be written as (1°, 2”). 

In the statement of each of the following six hypotheses it is assumed that there 
is a preassigned partition (1°, m1, m2, --+, ms) of the ¢ variates into q subsets 
(qq = b + A). 

(1) Hi(mvc): The hypothesis that within each subset of variates the means 
are equal, the variances are equal, and the covariances are equal and that be- 
tween any two distinct subsets of variates the covariances are equal. 

(2) H,(vc): The hypothesis that within each subset of variates the variances 
are equal and the covariances are equal and that between any two distinct sub- 
sets of variates the covariances are equal. 

(3) Hi(m): The hypothesis that within each subset of variates the means are 
equal, given that H,(vc) is true. 

(4) H,(MVC | mvc): the hypothesis that k normal t-variate distributions are 
the same given that they all satisfy H,(mvc) for a particular division of the vari- 
ates into subsets (k > 2). 


? 
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(5) H,(VC | mvc): The hypothesis that & normal t-variate distributions haye 
the same variance-covariance matrix, given that they all satisfy Hi(mvc) for a 
particular division of the variates into subsets (k > 2). 

(6) H.(M | mVC): The hypothesis that k normal t-variate distributions are 
the same, given that they all satisfy H,(mvc) for a particular division of the 
variates into subsets and that they all have the same variance-covariance matrix 
(k > 2). 

Any of the hypotheses stated above can be expressed in terms of an invariance 
condition on the normal ¢-variate distribution (or distributions); e.g., H1(mve) 
is equivalent to the hypothesis that the distribution is invariant over all permuta- 
tions of the variates within subsets. The pattern of symmetry present in the 
variance-covariance matrix of the distribution when any of the above six hypoth- 
eses is true is given in section 3 (see (3.2)). 

Six additional hypotheses, H,(mvc), Hi(vc), --- , H.(M | mVC), which are 
modifications of H,(mvc), Hi(vc), --- , Hi(M | mVC), respectively, will also be 
considered. In regard to any of these six H hypotheses, it is assumed that there 
is a partition (n")(n = 2,3, ---) of the ¢ variates (¢ = nh) and that in each subset 
the variates are in a given order; thus each subset has n variates and between 
any two distinct subsets of variates there are n” covariances, which form an n X n 
“block” in the variance-covariance matrix of the distribution (see (3.4)). The 
hypotheses may now be stated as follows: 

H,(mvc): The hypothesis that within each subset of variates the means are 
equal, the variances are equal, and the covariances are equal and that between 
any two distinct subsets of variates the diagonal covariances are equal and the 
off-diagonal covariances are equal. 

H,(vc): The hypothesis that within each subset of variates the variances are 
equal and the covariances are equal and that between any two distinct subsets 
of variates the diagonal covariances are equal and the off-diagonal covariances are 
equal. 

The statement of any of the hypotheses H,(m), H;,(MVC | moc), Hi(VC | mvc), 
and H.(M|mVC) is obtained from the statement of the corresponding H 
hypothesis by simply substituting H for H. The pattern of symmetry present 
in the variance-covariance matrix of the distribution when any of the six H 
hypotheses is true is given in section 3 (see (3.4)), from which the appropriate 
invariance condition on the normal distribution can be obtained. 

A test of any of the hypotheses H,(mvc), H,(mve), Hy(ve), Hy(ve), Hi(m), Hi(m) 
is based on a random sample from a normal multivariate distribution; a test of 
any of the remaining hypotheses is based on i random samples from k normal 
multivariate distributions, respectively, (kK > 2). 

A normal distribution for which an H or H hypothesis is true will be called 
compound symmetric. In the special case where the compound symmetry holds 
for a partition (t) of the ¢ variates, any H hypothesis and the H hypothesis 
corresponding to it are identical; in this case the normal distribution will be 
called completely symmetric. Problems (i) and (ii) (see section 1) have been 
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solved completely by Wilks [5] for Hi(mvc), Hi(vc), and H,(m) for the case of 
complete symmetry. 


3. Block symmetric matrices and vectors. Let m,; be the mean value of NX; 
and |! pi,o,0; || be the variance-covariance matrix of X,, --- ,X.(¢,j7 =1,---, 0) 
(p;; is the coefficient of correlation between X; and X;). The joint probability 
density function’ of X,, Xe, --- , X; is 


(B.1) f(Xi, Xe, -++,X) = [Gi |!” exp (-L Gi(Xs — m)(X; — ml, 


where || G;; || is positive definite and its inverse || @” || = || 2 pijox; ||. 

When any of the Hf hypotheses is true (see section 2), we represent || G” || 
by || A" || (also || G, || by || 4;,; |) which can be written as (3.2) (see page 452), 
where A“ = A** (s,8’ = 1,---,b) and D™ = D*%(a,a’ = 1,--- ,h;a#¥aq’). 
The A’s and B's with single superscripts and the C’s and D’s have been intro- 
duced to indicate the blocl pattern clearly. In general C* = C™ only if 
a = Xs = 1,---,b;a = 1,---,h). || Ag {| and || A” || have the same 
block pattern of symmetry. 

The blocks in (3.2) are formed by making a partition (1°, m1 , m2, --- , nx) of the 
t rows and ¢t columns of || A‘? ||. A matrix having the block pattern of sym- 
metry of (3.2) will be called block symmetric of type I. Clearly a block symmetric 
matrix of type I is invariant over all permutations of its rows and columns within 
the subsets determined by (1” , m1, --- , ms), if the row interchanges and column 
interchanges are the same. Also, a ¢-component vector will be called block 
symmetric if the order of values of the components is invariant over all permuta- 
tions of the components within groups determined by (1°, m1, «++ , ma). 

The determinant of the block symmetric matrix || A; || is 


h 
(3.3) | A tj | =K Il (Ag = ay", 


where 


, y? _ ’ , 
Cr ( 2h wer Cor . Diy Die 


1In general a chance quantity and the variable of its distribution function will be de- 
noted by the same symbol. 
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where Cea = Cra V Na > Ag = Aa + (me — 1)B,, and Dia = Dea V NaN’ 
(s = 1,---,b;a,a’ =1,---,h; aa’); Ase, Coa, Aa, Ba, and Daa are the cofac- 
tors of A“*’, C**, A*, B*, and D*’’, respectively in (3.2). The ellipsoid, defined by 
A,(X;i — mi)(X; — m;) = 1% (7 fixed and > 0), has (ma — 1) axes of equal 
length (a = 1, ---,h); and each of the remaining q axes is inclined to the co- 
ordinate axes so that its direction cosines have the same block symmetry as the 
set of diagonal elements in (3.2). 

When any of the H hypotheses is true, we represent || G” || by || A‘? || (also 
Gi; || by || Ai ||) which can be written as 


(34) || A || = 
A’ B ae RB 3 (- D” |) a ! i YS — a 
B - oT B 3 D' 2 " bon D” | a " oi 7 a 
B B A Fe pb” C? D"' Dp" "oe 
(= 21 Dp" A B B’ [= p* D* 
D ( Dp" RP Py P D”’ C A | Ya 
D Dp" Cc 2 BP A Dp” Dp” (= 
Gp” ey Dp" ey C 2h De oo D 7) fF j i" Co B' oa B h- 
a: eo” Dp" p* ~ D* B' A’ B' 
p™ p™ kale on | p” pb Poe (= } 3 B B - a 


where the blocks are formed by a partition (n") of the ¢ rows and t columns; thus 


each block is ann X narray. || A‘ |! and || A; || have the same block pattern 
of symmetry. 


A matrix having the block pattern of symmetry of (3.4) will be called block 


symmetric of type II. The determinant of |! A;; |] is 





(3.5) | As; | oe K""@, 
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where 
A,-B, Co— Dre Cu — Du | 
Cy — Dy Ay — By Co, — Day, | 
K = 
Ci _— Di Cis ™ Diz A, hia B, l, 
Ay Crs Cy 
on Ae Cay 
Q = 
GS Cn A, ’ 
where 1, = A, + (xn — 1)B, and C. = ¢... + &@ - 1)Doa' (a, a’ = 


1,2,---,h;a # a’); Aa, Ba, Cae, and Dyq are the cofactors of A’, B’, C7", 
D “*, respectively, in (3.4). 


4. Method of obtaining the sample criteria. The probability distribution, 
P, of a simple, random sample, say Ov(X1ia, Xea,°°* » Xta)(a = 1,2, --- , N), 
from II is 
(4.1) P = 9 “| G,; |"? exp [-d Gi;}(Xia — mi)(Xja — m;)). 

ijue 
For Oy fixed, P is the likelihood function of the parameters m , m,---, m, 
and G;; (i,j = 1,2, ---,¢). To obtain sample criteria for testing the H and H 
hypotheses we shall employ the Neyman-Pearson likelihood-ratio method. The 
details of applying this method will be given for only one of the hypotheses, since 
the technique of application is the same for all the hypotheses under 
consideration. 

In applying the likelihood-ratio method we maximize P under two different 
sets of conditions and form the ratio of the two maxima. ‘To derive a criterion 
for, say, H,(mvc), we first maximize P over the set, 2, of admissible values of the 
parameters in (4.1); secondly, we maximize P over the set, w, of admissible values 
of the parameters in (4.1) that satisfy Hi(mvc). Let Po and P., be these maxima, 
respectively. The likelihood-ratio criterion for H,(mvc) is \y(mvc) = P./Po; 
thus 0 < Ay(mvc) < 1. The sample criterion, L,(mvc), for Hy(mvc) will be chosen 
as a single-valued function of A,(mvc). 

4a. Derivation of the criterion L,;(mvc). The parameter spaces, 2, and, w, can 
be specified as follows: 

((1) || G;|! positive definite; 
(2) -—-x <m<+x( = 1,2,---, 2); 
(1) |i Aj; positive definite and block symmetric (of type I); 


w 


(2) —x < m; < +, (m,m.,--+, m,) block symmetric. 





on, 


an 
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The block symmetries in w(1) and w(2) are for the same partition (1°, mi, «++ 5 Ma) 
of the ¢ variates (see sections 2 and 3). 
Maximizing P is equivalent to maximizing 
L=nP = —(Nt/2) In « + (N/2) In | G;; | 
= De Gil X ia — mi)(X ja ‘ad m;). 


t.7.e 


(4.2) 


Solving the simultaneous equations dL/dm; = O(i = 1, --- , t) and aL/4G;; = 
O(i,j = 1,--- ,t;¢ <7) for m; and G”, we have 


N 
m; = (1/N) >> Xie = Xi, 
(4.3) _ = 
(N/2)G” a au (Xia ae Xi)(X ja — X;) a Wels 


substituting these values of the parameters into (4.1) we find that 
(4.4) Po = w X'?(2/N)"” | v5; | ~*”? exp [—Nt/2]. 


In (4.3) and (4.5) each expression at the extreme right is defined by the corre- 
sponding expressions at the left. 

In w(2) there are b + h groups of means, the means within a group being all 
equal; let m, be the s-th mean and m,, be the common value of the means in the 
(> + a)-th group. Solving the simultaneous equations 0dL/ am, = 0, 
aL/am,, = 0, dL/dA,. = 0, dL/dC.. = 0, dL/aA, = 0, dL/aBa 0, 
0L/dDa = O(s, 8’ = 1,---,b; a, a’ = 1,---,h; a # a’), we find that 


al i” 
mM, = Xe, 


(1/Nne) >» Xiga = Xi, 


aig 


(4.5) 


a # 
Mr, 


N 
(N/2)A**’ = Zz (Xo -_ X .)M(Xare a X,-) = Uss ? 


a=l 


(N/2)C" = (1/me) 20 (Kea — X.(Xi, — Xp.) = ee, 
(N/2)A* = (1/ma) 22 (Kise — Xp)" = Ua, 


(N/2)B° = [1/ne(ne — 1)] DO (Xige — XtMXize — Ri) = wh, 


a.taJa 

(N/2)D™ = (1/nana) DY (Xige — X7,)(Xigra — Xt.) = 200» 
where i,, jo = 6 + fig + 1,°-°, 0 + hast 3 te F Ja; a = Mm + -°°> + Me-15 
tm = 0;a,a’ =1,--- ,hsa Xa’. ;: , 

When H,(mvc) is true, the maximum likelihood estimates of m;, o;, and 
pij(t, 7 = 1, --- , t) would be obtained by means of (4.5) and the definition of 
|| A” || given just after (3.1). 

Substituting the expressions in (4.5) into (4.1) we find that 


(4.6) Py = 0? | v5 [7 (2/N)*” exp [—Nt/2], 


where 
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From (4.4) and (4.6) it follows that the likelihood-ratio criterion for Hi(mvc) is: 


(4.8) (moc) = [ | Vij | / | Vi; | ‘ae (2,9 — a, baie , #). 
Finally, as the sample criterion for H;(mvc) we choose 
(4.9) Iy(mvc) = [Ai(moe)]°"” = [l|;|/| Us; | J. 


4b. Preliminary calculations for evaluation of moments of L,(mvc). The deter- 
minant | v;; | in (4.9) is block symmetric. From (3.2), (3.3), and (4.9) it follows 
that: 


h 
\ 1 , 4\—(ng—1) 1; ff jl 
4.10) Entree) = |r| [TL (06 — wore? | iat 
a=1 
where 
” 
Uss' = Ves’ 5 


Vere = Wea Va 3 

Urere = Ya + (na — 1)wa; 

” ore / 

Urara’ = V Nata’ Zaa’ » 

(s,s) = 1,---, br, =1,---,b+hsr=b+a;a = 1,---,h). 
N 
Let Yio = Xia — m; and Y; = (1/N) >> Yia, (§ = 1,---,#). Clearly 

a=1 


N 
vi; = >, (Yaa - Y.)(Y;2 — Y;). When A,(mvc) is true, ui , Ve, We, aNd Zao’, 
a=1 


in L,(mvc), can be expressed exactly as they are expressed in (4.5) with Y sub- 
stituted for X, and (v, — w,) and v;,’ in (4.10) can be expressed as follows: 


v, _ w, = (1/n,_) {> Vigia —_ [1/(no — 1)] x Vicia} 
ta ta¥F Ja 
+ (N/na) ~ Yi. ii [N/na(na — 1)] x Y;, Yi0; 
ta taF la 
Ves’ = Uss’ 5 


(4.11) i (1/+/nz) Z Voie} 
Ron (1/na) de Vigia > 


ta:Ja 


Roa = (1/-/ nena’) p» Viaia 
tada’ 


From (4.10) and (4.11) it follows that when H;(mvc) is true, each element of the 


determinants on which L;(mvc) depends consists of: (a) a quadratic form in Y; 
and a linear function of the v;; ; or (b) merely a linear function of the 


I 





Vi; @,j =1,---, 2). 
The joint probability density function of the v;; and Y; is 
(4.12) f(vspg(N1 io Y,), 
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where 

ify (N—1)/2 (N—t—2)/2 i 

| Gj |" | Yas | exp [-d Giz vii] 
to] 


—_(N —1\./N—2)\._.(N—1\’ 
woo (SS) 2 (MG) 1 
(|| G:; || positive definite; N > t), which is the Wishart distribution [9, p. 120], and 
g(Vi, sii Y1) = |G; pe Nt a? exp [—N Ze Gi; Y; Yj] - g(Y), Say, 
Sd 


f(vi3) = 


which is a normal t-variate distribution. The d-th moment (d = 0, 1, ---) 
of L,(mvc), wnen H,(mvc) is true, is 


E[La(onve)l* =f floadoP) | ves Love [ 


(4.13) h ; C 
(x os <r? (II av) II dv;;, 
oa 1 +27 


where the domain, R, of integration is —« < Y; < + || »,; || positive semi- 
definite (¢,7 = 1,---,t). The integral in (4.13) is evaluated in section 6 (by 
means of Wilks’ moment-generating operators) for the case where H,(mvc) 
is true. 


a=l 


5. Remarks on Wilks’ Moment-Generating Operators. Wilks’ operators 
are applicable to a far wider class of problems than those treated in this paper. 
The following discussion is confined to a special use of the operators. 

From (4.12) it follows that 


| Vij Pe exp [— Ze G;; v; i] II dv;; 
(5.1) / . siete 


; t>) - | Gi; ‘ate 
where R’ is the region in the space of v;; for which |} v;; || is positive definite, and 
\| Gi; || is positive semi-definite. (Of course, the probability that || v<; || is not 
positive definite is 0.) Let Gy = Gi; + Bi;(7,7 = 1,--- , ); if all the 6;; are 
sufficiently small, || G;; |! is positive definite, and we have 


! 
| vs; Y*?? exp [— D2 Give) TI] avi; 
1G. aioe | $9 t=] 
pee -_ 


’ 


= = : 
gt DA Il ri(v ao i)/2] 
i=1 


isaietininesin 
(5.2) git is Il r(N — i)/2] 
t=1 
= |G;; prone | Ci; ‘Sondiaae 
which is E(g), where g = exp [— >. B:,0,]. 
ii 


Let Ji; be an operator (whose operand is a function of all the 8;;) which repre- 
sents the following set of operations: (a) replacement of each @;; in the operand 





id 


nd 


re 


o> 


nd 
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by Bi; + £€;; (b) integration (of the result of (a)) with respect to &(¢ = 1, , t) 

from — © to + ©; (c) multiplication of the result of (a) and (b) by ~H2 ‘tin 

(3.1) it follows that 

(5.3) Tii(g) = Ti; (exp [— p Bi; vi3]) =g | Vi; ‘hoe! (\| Vi; || pos. def.); 
ij 


and if all the 8;; are set equal to 0 after performing the J-operations, then g = 1 
and (5.3) yields | v;; | ~”. Let I?; be \ ‘repetitions (A = 1, 2, --- ) of J3;. 
Clearly, 


(5.4) E\r* i(g)] Is; ;-0 = — Elg | vs ~~ lasj—0 mr E|| vii | ine 


Under all conditions of their use in this paper the J operations are interchangeable 
with the E operation [8; p. 316]; thus, 


E\Iijg) = T:{E()).- 
From (5.2), (5.4), and [8, pp. 318-320] we have 
E{| Vij ~~ - | Gi; orn | Gi; poe ls 8; x0 


Gi; - Iv YIN paa 1, —Al, 


- _|nf/k+S8 (Fk 
where V > ¢+A+ landy(R, S) = 1 (#4 wi Ay 


The operator J;; may be used, as indicated above, to find negative half-integer 
moments of | v;;|. To obtain positive half-integer moments of | v;;| we may 
use an inverse operator J7} [8, pp. 321-323] (A = 1, 2,--- ) which has been 
defined in such a way that 


| Gi; ‘is | {J I | Gi; — — leno = - E\| Vi; | -~ 
= |G." (II VIN — i, ul). 
i=l 


The equality between the second and third expressions in (5.6) can be obtained 
from (5.1) by replacing N by N + \X (see [7]). 

In (5.5) and (5.6) the 6's are not necessary; however, in (4.13) and in similar 
expressions for the moments of the other criteria there are several determinants; 
each determinant requires a distinct J-operator, and it is of great convenience to 
introduce a distinct set of 8’s for each J operator. The @'s associated with a 
given operator may initially appear in more than one of the determinants in the 
operand. The order in which several J-operators are used is illustrated in the 
following case for two: 


(5.7) Tei | Gis a? | Gs) et jmol |e; jo - 
where A, p > 0 and the values of k’ and k”’ are such that the value of the expres- 


sion is well defined. The notation in (5.7) means that J;? is applied to | GY; |~*"’ 
the 6's associated with J;? are set equal to zero, then I>; is applied to the product 


(5.5) 


(5.6) 
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of | G; ; |’ and the results of the previous operations, and then the §’s associated 
with J}; are set equal to zero. The interchangeability of the order of J opera- 
tions is discussed in [8, p. 324]. 


6. The moments and distribution of 1; (mvc) when H, (mvc) is true. To 
evaluate (4.13) we let 


(6.1) g = exp [— 20 Biivis — DU Balva — wa) — Lo Bre Ver’). 
4.7 a rr? 

From (4.11) and (4.12) we have 

(6.2) E(g) a | , - 2 | Ay — -1)/2 | Me rn 


where 


/ 


Ass’ — fay -+- Bese + Gas 
Au = a + Bs ta + Bera/V Na ’ 


/ 


/ ” 
Ait, = Aa + Bigig t Ba/Na + Brere/Na; 
a! ” ie i 
A isis aa Ba + B isis nin Ba/ (Na 7 1)na + Brar,/ Te ; (lg # ded, 
” , — 
A igia’ = Daa’ v Bigia’ + Brora’ V nana’ 5 (a oa a’), 
” 
ae — Ager ; 
” 
i « @., 
4” / 
Ai .ie — Ag + A. Na , 


Aji = Ba — Ba/na(na — 1), (ta # ja), 
ad 


Avis = De, (a ~ a’). 


When H,(mvc) is true, we have 


h 
E{L,(mve)\" = | Ai; |*” j — [A URE Ag 8 eda 
\a=1 Bg=0 
f {7 
(6.3) = 4 [wiv — 1, 2A); ‘Lv + 2d-r,- 2a) 
\ t=1 r=1 


x { I WI(N + 2d)(n. — 1), — 2d(na — mb {T (n. — o~", 


(dd =0,1,2,---;N>2), 


where q = b + hand y(R, S) is defined in (5.5). In (6.3) the assumption that 
H,(mvc) is true implies that after we apply J;;* and set the 8;; equal to 0 all 
remaining determinants are block symmetric; we may then use (3.3) before 
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I 


. 2d 2d(nq—1) ; : 4 
applying J;,, and J, ee i 1,---,h). The expression in (6.3) may be 
written as follows: 


E{L,(mvc)|\* 


;  * +a)| ’ r(M— 2) 


indies I rinietiaen heen Mewes 
i=l+q —_ a=1 N(n, —_ 1) 
fF | r(N@ =D + an — 0) 


(6.4) (II (Ne — ee 





en mgs 


Pama lesen) | 
h no—l ee : . 
=|] : 


a=1 s4=1 ie nee: BS ———— 
. ° (na — —), 


where fiz is defined in (4.5) and (T)¢ = T'(T + d)/T(T). 

We now consider the problem of identifying from (6.4) the distribution of 
L,(mvc) (when H,(mvc) is true). Let @ be a beta variate, i.e., a variate whose 
c.d.f., F(@), is 


(6.5) F(0) = In(P, Q), (0<@<1;P,Q>0), 


which is the Incomplete Beta Function ratio. J(P, Q) is tabulated in [1] 
and [3]. The d-th moment of @ is: 


“ 1_T(P+a) WP+Q 
- MOY = —TP) TP+O+a 





= (P)a/(P + Q)a, 

(d= 0,1,---). Let 

(6.7) 7 = 118; (c = 1,2, ---), 
7 


where the 6;(7 = 1, ---, c) are mutually independent and each @; is a beta 
variate, having parameters p;, q;, say. The d-th moment of 7 is 


(6.8) E(z)’ = [1 @,)a/@it+a)a, (= 0,1,---). 
j= 
Given a variate, say uw (0 < uw < 1), whose d-th moment (d = 0, 1, --- ) is given 


by (6.8) we can infer by means of the solution of the Hausdorff problem of mo- 
ments that » and 7 have the same exact probability distribution function (see 
Corollary 1.1 [2, p. 11]). It should be noted that (6.4) can be written as 


h fra—l 


(6.9) E{L,(mve)|’ = I] IT { ((pasa)a/(Pasz + as,)al » 


a=1 8sqg= 
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where Pa, = [(N — g — Sa — tla t+ a — 1)/2] > 0,7 
(sa — 1) ttateretty 
Vaz, = | ee 1) -+ 9 > 0; 


thus (6.4) is a special case of (6.8). 


The exact probability (density) function, say g(r), of 7 has been obtained by 
Wilks [7, p. 475] and is: 


al 1 
-_pe-l —n—1 -1 qo—l 12 
g(r) = Kr" (1 - fete | -- | ea --> iz 
0 0 


x (1 — »,)F-*%-1* (1 — )Fe-2 2 w.. (1 — o_O 
(6.10) x (1 — w(1 — 772-8 [1 — fr + vo(1 — 1) }(1 — 7)? .. 
[1 — {01 + oo(1 — 1) +--+ + 0.a(1 — m)(1 — v2) «++ (1 — ¢-2)} 
(1 — r)Pet Peay 


c—1 
x [I ae;, 
7=1 
I(p; + 4)  - 
where A = II a Za (pe-;' + Qe—i'); 
7’/=0 


71 L I'(p,)P'(q) 


= >> p.-;. An approximation of the distribution of a product of inde- 
j/=0 


pendent beta variates by the distribution of a single beta variate is given in [4]. 

The results of this section may be summarized as follows: If H,(mvc) is true, 
the d-th moment (d = 0, 1, --- ) of the exact distribution of L,(mvc) is given by 
(6.4). Also, if H,(mvc) is true, the exact distribution of L,(mvc) is given by 
(6.10), where the p; , g; , and c can be specified by means of (6.4). The cumula- 
tive distribution of L,(mvc) is given for certain special cases in section 7g. 


7. Single Sample Criteria. The solutions of problems (i) and (ii) (see section 
1) for H,(mvc) are contained in (4.9) and the summary at the end of section 6. 
In the present section solutions of problems (i) and (ii) are given for each of the 
remaining two H, hypotheses and the three H; hypotheses (all of which are stated 
in section 2). For any of the hypotheses the sample criterion is chosen as a 
single-valued function of the likelihood-ratio criterion for the hypothesis. The 
methods of determining the moments and identifying the distribution of each 
sample criterion (when the corresponding null hypothesis is true) are entirely 
similar to those used in sections 4, 5, and 6 in regard to Hi(mvc). Section 7g 
gives the exact distributions of the single-sample criteria for certain special 
compound symmetries. 

Each criterion discussed in this section is based on a sample 


On(Xia + Xa a ae Xta)(a — R ae ee N; N > t) 


WS Se ODO ORM Oe Pel 


— 
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of size N from a normal t-variate distribution (t = 3,4, --- ). As in the case of 
H,(mvc), it is presupposed for testing H,(vc) or H,(m) that there is a certain 
partition (1°, 1, m2,-+-, ms) of the ¢-variates; for testing H,(mvc), H,(ve), 
or H,(m) it is presupposed that there is a certain partition (n") of the ¢ variates 
(see sections 2 and 3). 


7a. The test L,(vc) for the hypothesis H,(vc). For the sample criterion for 
H,(vc) we choose 


(7.1) L(vc) = [Ar(ve)}"™ = | v5; | / [6:5 |, G,j =1,---,@ 
where A, (vc) is the likelihood-ratio criterion for H,(vc), v;; is defined in (4.3), and 
Vas" = Ves’, 
d.i, = (1/na) } Vsias 
Ja 
Visite = (1/na) De Vieias 
Ja 
Visie [1/na(na or 1)] z Vinis ’ 


45474 


Bisiee = (1/nana) Do visi» 


, 
24°24" 


(s,s’ = 1,---,b;a,a’ = 1,--- ,hja 4 @3%e, te, Ja,Ja= b+ Me tl1,-:-, 
b + Masi 3 Ma = M1 +--+ + Me13%1 = 0). Since || d;; || is a block symmetric 
matrix, there is an expression for | i;;| that is entirely similar in form to the 
expression in (3.3) for | Ai; | (see also (4.9) and (4.10)). 

If H,(vc) is true, 


E{L,(ve)* = ne WN — i, 2a)} 
{I vI(N — 1+ 2d)(ne — 1), —2d(mo — on} 


x {IL viv — r+ 24, ~2al} {TE on. — vere} 


r=1 


(7.2) 


N —q—5%— +a — 1\) 
A da lt Boal 
ia Il Il " (55 - (sa — 1) "ae ( = —? ied 


- (ne — 1) Ja 





where g = 6 + hand ¥(R, S), vi, and (T)a are defined in (5.5), (4.5), and (6.4), 
respectively. From (7.2) and the argument given after (6.8) it follows that 
if H,(vc) is true, the exact distribution of Z;(vc) is given by (6.10), where the © 
P;,4,, and c can be specified by means of (7.2). 
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7b. The test L,(m) for the hypothesis H,(m). For the sample criterion for 
H,(m) we choose 


5 iets 
(7.3) Ly(m) = (a(m)P* = 1, (j= 1,++-,0) 
pes] 


’ 


where \,(m) is the likelihood-ratio criterion for Hi(m) and vs; and 3;; are defined 
in (4.7) and (7.1), respectively. In passing we note that 
(7.4) [Li(m)}[Zi(ve)] = Ly(mve). 
If H,(m) is true, 
h 
E[L,(m)/* = 


a 


X wi(ma — 1)(N + 2a), —2a(n. — 1))} 


_ = tN) 
h ra—l - + a} 
} Z Ne — 1 al 


If H,(m) is true, the exact distribution of Z,(m) is given by (6.10), where the 
pj, 4; and c can be specified by means of (7.5). It follows from (7.5) that the 
exact distribution of L,(m), when H,(m) is true, does not depend on b. 


{yl(N om 1)(a nied 1), 2a(na — 1)] 


1 


(dd = 0,1,---). 


7c. The test L,(mvc) for the hypothesis H\(mvc). The sample criterion, Z;(mvc), 
for H,(mvc) (see section 2) is 


(7.6) Li(mve) = [Xi(mve)* = | 0; |/ 05 |, Gj =1,---,0 


where \,(mvc) is the likelihood-ratio criterion for H,(mvc), v;, is defined in (4.3), 
and 


, 


0i.%, — (1/n) 2 (X j,a Pa Xa); 


asJa 
Digi, = U/n(n — 1) DS (Kite — XD(Xjrze — XQ), (ig ¥ ja)s 
isis 
Dine = (i/n) DY (Xie — Xi)(Xtsea — Xz), 
aJjake’ 


(ka. = ja + nla’ — a);a # a’), 
din, = [1/n(n — 1)] a (Xiga — Ka)(Xnsva — Xa’), 


a Jasha’ 


(h., jg + nla —a’');a # a’), 


(a= 1,-::,h3ta,ja, ha, ka = (€ —1) n+ 1,--: ,an;ka = ic + n(a’ — a); 
ha ¥ ia + n(a’ — a);a = 1,---,N). || 5;; || is a block symmetric matrix, 
of type II (see (3.4)), in which the blocks are formed by a partition (n") (t = nh) 
of the rows and columns; there is an expression for | 3;, | that is entirely similar 


in form to the expression in (3.5) for | Aj; |. 
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If H,(mvc) is true, 


E[L,(mvc)|? = (n — 1)** “ II WN — i, 2d)} 


i=ht1 
x ne VI(N + 2d)(n — 1) +1 —a, —2d(n — 1} 
a=1 J 


(7.7) I( —h— eo (n — Ia - ») 
2 a\ 


a=1 =I | (NS l—a. s—1\ (’ 
a * 3-1 t a! | 


(d = 0,1, ---). 





If Hi(mvc) is true, the exact distribution of L,(mvc) is given by (6.10), where 
thep, , q; and c can be specified by means of (7.7). 


7d. The test L,(vc) for the hypothesis H,(vc). The sample criterion, Z,(vc) for 
Aiy(vc) (see section 2) is 
(7.8) L,(vc) = [Ar(ve)}"* = | vis | / | di5 | (@,j =1,---, 8), 
where A;(vc) is the likelihood-ratio criterion for H,(vc), vi; is defined in (4.3), and 


Visig = (1/n) DE Visia» 
Ja 


Digi, = U/n(n — 1) DO viz, (ia # ja), 
iA is 
Dijk.” = (1/n) = Vigkge » (ko. = ja + n(a’ — a); a Xa’), 
Jaki? 
Distar = [1/n(n — 1)] Do vjeng-, (hae ja + nla’ — a); a # a’), 
Jaha’ 


where the ranges of a, ta , ja, ha, ka are given in (7.6). There is an expression 
for]| 0;; | which is entirely similar in form to the expression in (3.5) for | Aj; |. 
If H,(vc) is true, 





E[Ly(ve)? = (n — 1)" | II wn -3, 24) | 
i=h+1 


x {Dw —1+ 2d)(n — 1) +1 — a, —2d (n — »\ 


- ‘é ~h-se~ @ -~ Ue | 
) ; 4.0 @ =0,1,---). 


AY wot ime yeni) | 

( 2 2n—1) n-—1/a} 
If A,(vc) is true, the exact distribution of Z,(vc) is given by (6.10), where the 
P; 59; and c can be specified by means of (7.9). 


(7.9) 
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Ze. The test L,(m) for the hypothesis H,(m). The sample criterion L,(m), 
for H,(m) (see section 2) is 




















a 7 , L 5. 
(7.10) id = elt = Sn 
Ly (vc) | 
where },(m) is the likelihood-ratio criterion for H,(m) and || #4; || and || 5; || are 
given in (7.8) and (7.6), respectively. _ 
If H,(m) is true, the d-th moment (d = 0, 1, ---) of L;(m) is 





=1| (N l—-—a s—l 
| ln Sicrcadalle | 
{ (F+54—% +224), J 
(dd = 0,1,---). 


If Ay(m) is true, the exact distribution of Z,(m) is given by (6.10) where the 
Ps, q; and c can be specified by means of (7.11). 


ess 5 +6 


(7.11) 





7f. Relations among Li(mvc), Ly(ve), and L,(m) and among L,(mvc), Ly(vc), 
and L,(m). L,(mvc) is the product of L,(vc) and L,(m) (see (7.4)); moreover, 
when H,(mvc) is true, the d-th moment (d = 0, 1, --- ) of Zi(mvc) equals the 
product of the d-th moments of Li(vc) and L,(m) (see (6.4), (7.2), and (7.5)). 
From this result and the argument given after (6.8) it follows that when H,(mvc) 
is true, L;(mvc) is the product of two independent chance quantities, namely, 
Li(vc) and L,(m). Similarly, when H,(mvc) is true, L,(mvc) is the product of 
two independent chance quantities, namely, Z,(vc) and L;(m). 















7g. Exact distributions of single sample criteria in special cases. For a sample 
of size N and a partition (1°, m, --- , m,) of the ¢ variates of II (see section 2) 
let the cumulative distribution function (c.d.f.) of Z,(mvc), when H,(mvc) is 
true, be 


(7.12) F(w|1’,m,-:+,mx|N) = Prob {L,(mve) < u}; 


also, let F(y | 1°, m1, ---, m,|N) and F(z] 1’, nm, ---, m|N) be the c.d.f.’s 
of L,(vc) and L,(m) when H,(vc) and A,(m) are true, respectively. Let 
F(a|n"|N), F@j|n'| N), and F(Z|n' | N) be the c.d-f.’s of L,(mve), Li(ve), 
and L;(m) when H,(mve), Hi(ve) and H,(m) are true, respectively. 

It can be shown that 


F(u|1°,2|N) = I,{(N — b — 2)/2, (6 + 2)/2], 
F(u|1’,3|N) = 1)-(N —b — 3,b +3], 
F(y|1’,2|N) = 1,{(N — b — 2)/2, @ + 1)/2I, 







COMPOUND SYMMETRY 


F(y|1’,3|N) = 1). (N — b — 3,b + Ql, 

F(z | 1’,n|N) = I. ((N — 1)(n — 1)/2, (n — 1)/2], [e’ = 2”, 
F(a|2|N) =I), 1N — 4,3], 

Fg|2°|N) =); (N — 4, 2], 

F(Z|n?|N) = 12 ((N —1)(n— 1) —1,n— J), [27 = PPO] 


where J,(P, Q) is defined in (6.5). 
Distributions of the criteria in certain cases where the normal distribution is 
completely symmetric (see section 2) are given in [5]. 


? 


7h. Asymptotic distributions of the single sample criteria. When the sample 
size, N, is large, we may use a theorem [6] (see also [9, pp. 151-2]) concerning 
the approximate distribution of the likelihood-ratio criterion. For large N the 
distributions of the quantities —N In L;(mvc), —N In Iy(vc), and —N In L,(m) 
(when H,(mvc), Hi(vc), and H,(m), respectively, are true) are approximately 
chi-square distributions with (1/2) [t(¢ + 3) — b(6 + 3) — h(h + 5)] — Ab, 
(1/2)[t(t + 1) — b(b + 1) — h(h + 3)] — Ab, and t — b — h degrees of free- 
dom, respectively. Also, for large N the distributions of the quantities 
—N In Ly(mvc), —N In Iy(vc), and —N In L,(m) (when H,(mvc), Hy(vc), and 
H,(m), respectively, are true) are approximately chi-square distributions with 
[t(t + 3)/2 — h(h + 2)], [t(t + 1)/2 — h(h + 1)], and ¢t — h degrees of freedom, 
respectively. 


8. k-Sample Criteria. In this section solutions of problems (i) and (ii) (see 
section 1) are given for the three H;, and the three H; hypotheses (all stated 
in section 2). 

A test of any of these hypotheses is based on k simple, random samples (k > 2) 
from k compound-symmetric, normal ¢-variate distributions. The probability 
density function, Q, of the k samples, say, Ov,(p = 1,---,k; Np > b +h) is 


k 
(8.1) —— {11 | Gis | ne 
p=1 


X exp [- Ps Giza Xia, — Mip(Xie, — mip)I, 


1.7. Dap 


k 


(N’ = > N> ;7,j7 = 1,---, t), where Xi, is the a,-th sample value of the 


p=1 

i-th variate in the p-th population (a, = 1, --- , Nz), mi,p is the mean (expected 
value) of the i-th variate in the p-th population, and (1/2) || Gi;.p ||~’ is the 
variance-covariance matrix of the variates in the p-th population (see (3.1)). 
For a given set of k samples Q is the likelihood function of the parameters 
Gij,.p and mp @,j = 1,---,t;p =1,---, hk). 
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The six hypotheses under consideration (see section 2) can be restated in terms 
of Gi;,p and m;,, ; e.g., H.( MVC | mvc) asserts that m;, = mig = --- = My 
and || Gija{|| = || Gij2|] = --- = || Giz. || given that for all p the vector 
(m1,», °** , M:z,p) is block symmetric and the matrix || G;;,p || is block symmetric 
(of type I) for a preassigned partition (1°, m,---, m) of the ¢ variates (see 
sections 2 and 3). 


8a. Expressions for the criteria. Let \x(MVC | mvc), --- , \,(M | mvc) repre- 
sent the likelihood-ratio criteria for the six hypotheses H;(MVC | mvc), --- , 
H,(M | mVC) respectively, and let L,(MVC | mvc), --- , Li(M | mVC) be the 
sample criteria for the respective hypotheses. We choose the L; as follows: 


Ly(MVC | moc) = [\x(MVC | moo)}’, 
L,(VC | mve) = [\x(MC | moc)]’, 
L.(M |mVC) = [\.(M | mV CC)’ , 
= {Lat | mee)\"*" 
\LAVC | moe) | 
the expressions for L;,(MVC | mvc), L.(VC | mvc), and L;(M | mVC) are the same 
as those in (8.2) with \; replaced by \;. The A, and x can be obtained explicitly 


by straightforward application of the likelihood-ratio method (see the paragraph 
preceding section 4a). 


8b. Moments of the k-sample criteria. The exact distribution of any of the 
k-sample criteria, when the corresponding null hypothesis is true, is given in 
(6.10), where the quantities p; ,q; , and c can be specified by means of the moment 
expressions given below. The moments have been obtained by means of the 
operators discussed in Section 5. 

For each of the following six moment expressions the null hypothesis, cor- 
responding to the sample criterion involved, is assumed to be true: 


(5 - f+ =”), 
G - et Se), J 
Ny("a—1) a ) 
Ht Gt wen), 

G + Hie 5) 7 


of (utp — ) 
oN, + Np a 


' @4r~1) . &«+ 


E(L.(MVC | mvc)|? 


E(L.(VC | mvc)}" 
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Nee) (| (uy — 1) \_ 
I (5 * Nylte = Dh, 
(u’ — 1) | 

2 + N'(Ne — J) } 


E[L.(M |mVC)|" = 


\ 


a (up — 1) 
+ oe ) 
E[L,(MVC | movc)]? a __2N> ___ Be d | 


a u—l 
~ ON’ + N’ : 


[-« (u» — 1) 
? * aN, — 1)” N,(n — 1 *), 


i ad (u! — 1) 
t ie —) * Na ), 


a a 
- ay. + “e), 


) 
’C | moc))’ = <¢ 2 a Vp =i) | 





|e Oo. (u — a 
2N’ 
[=< (up — 1) 
+ 2N p(n - — 1) + N,(n — >), 





i—@ (u’ — 1) 
2N'(n - —_ 1) + N'(n — AD 
E(L,(M | mV C)|" 


where d = 0, 1, --+ and (T)q is defined in (6.4). 


8c. Comments on the criteria. By an argument similar to that used in section 7f 
it follows from (8.3) that when H,(MVC | mvc) is true Li(MVC | mvc) is 
the product of two oe distributed chance quantities, namely, 
L,(VC | mve) and [L;(M|mVC)]"’. The same assertion holds true if we re- 
place each L by L and H by H. 

Exact distributions of the k-sample criteria, when the corresponding null 
hypotheses are true, can be obtained explicitly for special values of k and special 
compound symmetries; but owing to lack of space we shall not consider them 
in this paper. 
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When the sample size N’ is large, the exact distributions of 
—In Lyx(MVC | mvc), ~-In Lx(VC| mvc), —N’ In L,(M | mVC), 
—In [,(MVC | mvc), —In L;(VC | moc), 


and —N’ In L;(M | mVC) (if the corresponding null hypotheses, respectively, 
are true) are approximately chi-square distributions with 


1 fPOFD 4 my + Met), 


(k — 1)[b(b + 1)/2 + hb + Ah + 3)/2], 


q(k — 1), h(h + 2)(k — 1), h(h + 1)(K — 1), and h(k — 1) degrees of freedom, 
respectively. 

















9. Illustrative examples. The first of the following two examples’ illustrates 
the use of L,(muc), L,(vc), and L,(m) in a psychometrics experiment; the second 
example illustrates the use of Z,(mvc), L,(vc), and [,(m) in a medical-research 
experiment (see section 1). 

EXAMPLE 1. In an experiment to establish methods of obtaining reader 
reliability in regard to essay scoring, 126 examinees were given a three-part 
English Composition examination. Each part required that the examinee write 
an essay, and for each examinee four scores were obtained on the following four 
things, respectively: (1) the part-2 and part-3 essays together, (2) the original 
part-1 essay, (3) a long-hand copy of the part-1 essay, (4) a carbon copy of the 
long-hand copy in (3). Scores were assigned by a group of “English Readers” 
using procedures designed to counterbalance certain experimental conditions. 
The score on (1) serves as a criterion. The experimenter asks whether on the 
basis of the sample (of size 126) the quantities associated with (2), (3), and (4) 
can be considered as interchangeable among themselves and interchangeable 
with respect to their relation to the criterion (1). 

Let X;, X2, X3, and X, be the scores on (1), (2), (3), and (4), respectively. 
It is assumed that (X,, X2, X3, X4) has a normal 4-variate distribution and 
that the set of scores (Xia, Xea, X3a, X4a) (2 = 1, --- , 126) obtained from 
the essays is a random sample of values of (Xi, X2, X3, X4). The following 
three questions will be considered (see section 2), where the grouping of the four 
variates is (1, 3): (a) Is the sample consistent with the hypothesis H,(mvc)? 
(b) Is the sample consistent with the hypothesis Hi(vc)? (c) Is the sample 
consistent with the hypothesis H,(m)? In the particular experiment under 
discussion (a) is the experimenter’s question. 























2 Mr. L. R. Tucker (Educational Testing Service, Princeton, New Jersey) and Captain 
J. Allan Rafferty, M.D. (Air University School of Aviation Medicine, Randolph Field, 
Texas) kindly gave the author the data for Examples 1 and 2, respectively. 
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The sample means and variance-covariance matrix are as follows: 


Xi X°2 X3 Xs 
77.8976 20.9425 23.4544 18.0384 || 
20.9425 25.0704 12.4363 11.7257 | 
|| 23.4544 12.4363 28.2021 9.2281) 
|| 18.0384 11.7257 9.2281 22.7390) 


Means 28.0556 14.9048 15.4841 14.4444 


This matrix is (1/126) || v:; || (¢,7 = 1, --+ , 4) (see (4.3)). The sample criteria 
L,(mvc), Ly(vc), and Li(m) will be used to answer questions (a), (b), and (c), 
respectively. The values of the criteria can be computed from the values of 

viz |, | vi; | , and | d;;| (see (4.9), (7.1), (7.3)), where v;; is given in (4.7) and 
i;; is given below (7.1). The i;;(¢ # 1 ¥ j) are evaluated by simple averaging 
of certain elements in || v;; ||. Both | v;;| and | #,;| have the block pattern 
of (3.2) and can be expressed in the simplified form of (3.3), where h = 1 and 
n, = 3; the simplified form of | v:5 | can also be obtained from (4.10) and (4.11). 
From the data above it is found that 


L,(mvc) = | v;; | / | vi; | = .9214, 
Ly(vc) = | v:;|/| 0s; | = 9568, 
Lim) = |6:;| / | v7; | = 9630. 


The second, fourth, and fifth formulas in (7.13) (for VN = 126, b = 1, n = 3) 
give the distributions of L,(mvc), Ly(vc), and Li(m), respectively (when the 
hypothesis with which the criterion is associated is true). By direct computa- 
tion with expressions for the Incomplete Beta Function ratios the per cent pomts 
corresponding to the observed values of L,(mvc), Li(vc), and L,(m) are found 
to be .26, .49, and .09, respectively. Thus at the 5% significance level the 
answer to any given one of the three questions (a), (b), (c) is yes. Critical 
values of L,;(mvc), L,(vc), and L,(m) for various significance levels can be ob- 
tained from [3] by interpolation. 

EXAMPLE 2. In an experiment to study certain properties of the blood of 
asphyxiated dogs, the %CO,. and hematocrit of 10 asphyxiated dogs were meas- 
ured four minutes and seven minutes after asphyxiation. Let X; and X; be 
%CO. and hematocrit four minutes after asphyxiation, respectively, and X, 
and X,; be %COz and hematocrit seven minutes after asphyxiation, respectively. 
It is assumed that (X,, X2, X;, X4) has a normal 4-variate distribution and 
that the set of measurements (Xia, X2a, X3a, X4a) (2 = 1, --- , 10) obtained 
from the 10 dogs is a random sample of values of (Xi, X2, X3, X4). The fol- 
lowing questions will be considered, where the grouping is (2°): (a) Is the sample 
consistent with the hypothesis H,(mvc)? (b) Is the sample consistent with the 
hypothesis H,(ve)? (c) Is the sample consistent with the hypothesis F,(m)? 
In the particular experiment under discussion (a) is the experimenter’s question. 
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The sample means and sums of squares and cross-products are as follows: 


XxX, x x x, 
| 294.916 313.908 —89.364 —69.282/ 
| 313.908 363.689 —130.422 —69.261 
| —89.364 —130.422 210.356 241.688 
| —69.282 —69.261 241.688 515.789 


















Means 50.780 53.590 41.180 43.890. 





This matrix is || v4; |! (¢,7 = 1, --- , 4) (see (4.3)). The sample criteria L,(mvc), 
L,(ve), and L,(m) will be used to answer questions (a), (b), and (c), respectively. 
The values of these cricveria can be computed from the data above (see (7.6), 
(7.8), and (7.10)) and are found to be: 

















L,(mve) = | v5; | / | D: | = .09107, 
(vc) = |vi,|/| di; | = 8259, 
LI\(m) = |d;;| /|5;;| = .2794. 


The sixth, seventh, and eighth formulas in (7.13) (for N = 10, n = 2) give the 
distributions of L,(mvc), L,(vc), and L,(m), respectively (when the hypothesis 
with which the criterion is associated is true). From [1] it is found that the 
observed values of L,(mvc), L,(vc), and L,(m) correspond to the 1.2, 12.4, and 
.6 per cent points, respectively, of the distributions referred to above. Thus 
at the 5% significance level the answer to questions (a) and (c) is no and to (b) 
is yes. The critical values of Z,;(mvc), [,(vc), and L,(m) for various significance 
levels can be found from [3]. 

More than one of the sample criteria may be of interest in regard to a given 
sample (see [5] pp. 267-268). For example, in an experiment such as that 
described in Example 1 suppose the answer to question (a) is no. The experi- 
menter might then consider question (b); if the answer is no, the inconsistency 
of the sample with H,(mvc) might be regarded as due to the variances or co- 
variances. If the answer to (b) is yes, the experimenter might then consider (c) ; 
if the answer here is no, the inconsistency of the sample with H,(mvc) might be 
regarded as due to the means. If, however, the answer here is yes, further study 
might be required to “explain” the inconsistency. 
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BRANCHING PROCESSES! 


By T. E. Harris 
Project RAND, Douglas Aircraft Company 


1. Summary. This paper is concerned with a simple mathematical model 
for a branching stochastic process. Using the language of family trees we may 
illustrate the process as follows. The probability that a man has exactly r 
sons is p,, 7 = 0, 1, 2,---. Each of his sons (who together make up the first 
generation) has the same probabilities of having a given number of sons of his 
own; the second generation have again the same probabilities, and so on. Let 
2, be the number of individuals in the nth generation. We study the probability 
distribution of z,. Some previous results are given in section 2; these include 
procedures for computing moments of z, , and a criterion for when the family 
has probability 1 of dying out. In sections 3 and 4 the case is considered where 
the family has a non-zero chance of surviving indefinitely. In this case the 
random variables z,/EHz, converge in probability to a random variable w with 
cumulative distribution G(u). It is shown that G(u) is absolutely continuous 
for u # 0. Results of a Tauberian character are given for the behavior of G(u) 
as u—Qandu— «. In section 5 some examples are given where G(u) can 
be found explicitly ; G(w) is computed numerically for the case p; = 0.4, p. = 0.6. 
In section 6 families with probability 1 of extinction are considered. A method 
is given for obtaining in certain cases an expansion for the moment-generating 
function of the number of generations before extinction occurs. In section 7 
maximum likelihood estimates are obtained for the p, and for the expecta- 
tion Hz, ; consistency in a certain sense is proved. In section 8 a brief discussion 


is given of the relation between two types of mathematical models for branching 
processes. 


2. Introduction. By a branching stochastic process is meant a phenomenon 
of the following general type: each of an initial aggregate of objects can give rise 
to more objects of the same or different types, the objects produced can then 
produce more, and the system develops, subject to certain probability laws. 
Examples are the development of human or animal populations, propagation of 
genes, and nuclear chain reactions. The mathematical model dealt with in this 
paper may be thought of as representing the generation-by-generation growth 
of a family, the fundamental random variable being the number of individuals 
in the nth generation. Under certain conditions, however, this model may 
describe the size of a family at a sequence of points in time. This question will 
be touched on in section 8. 


1 Based on a doctoral dissertation presented to the Mathematics Department, Princeton 
University, June, 1947. 
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DEFINITION 2.1. The random variables z, ,n = 0, 1, 2, --- , will be said to 
represent a simple discrete branching process provided: z = 1; P(z: = r) = p,, 


r = 0, 1, 2,---, with =. pr = 1; the conditional distribution of 2,4: , given 
r=0 


z, = 1, is that of the sum of r independent random variables, each having the 
same distribution as 2;. 


Assumptions. Throughout this paper we assume that >. rp, < &, that at 
r=0 


least two of the p, are positive, and that po + m: < 1. 
DEFINITIONS 2.2. Let x = Ez = Urp,,o = Var (2) = =rp, — 2°. Let 


f(s) = Zz prs’ be the generating function of z; (s denotes a complex variablk*. 
r=0 


Let Par = P(zn = r) and f,(s) = > ¥ PnrS ; Of course pir = pr and fo(s) = s. The 
r=0 


assumptions given above insure that the first and second derivatives f’(s) and 
f’’(s) are continuous in the set consisting of the interior of the unit circle and the 
point s = 1; thus derivative notations such as f’’(1) are used even though f(s) 
may not be analytic at s = 1. It will be seen shortly that a similar remark 
applies to the functions f,(s) and certain functions to be introduced later. 

In the remainder of this section we shall summarize certain results; most of 
them are contained implicitly or explicitly in works by Fisher [1], Lotka [2], 
Steffensen [3], Ulam and Hawkins [4], Kolmogoroff [5], Kolmogoroff and Dmitriev 
[6], and Yaglom [7]; some of these references are not widely available. 

From our definition, P(zn41 = k|z, = j) is the coefficient of s” in [f(s)}’. 


Hence pn+1,x is the coefficient of s* in ps Pailf(s)), whence 
j=0 


(2.1) fnai(s) = falf(s)]. 


Letting n = 1, 2, --- , successively, it follows that the generating function of Zn 
is the nth functional iterate of f(s). Hence 


(2.2) fnir(s) = f[fn(s)]. 
We note that f,(1) = Ez, ,f.(1) +f.(1) — [f.(1)? = Var(zn). Differentiation 
of (2.1) at s = 1 gives fz4:(1) = x”*?; another differentiation gives frii(1) = 
ft) P + f’(DF7(1) while twofold differentiation of (2.2) gives firaa(l) = 
f’'(1)t. (1) + Lf’) 7% (1); these two expressions for f’.4:(1) can be equated and 
solved for f’,(1), provided x = f’(1) #1. Thus the mean and variance of z, are 
given by Ez, = (Ez)"” = x"; Var (zn) = a -— = 1) , x ¥ 1; Var (z,) = no’, 
x = 1. Higher moments, if they exist, may be found by a similar process. 
DEFINITION 2.3. Denote by a the smallest non-negative real root of the 
equation t = f(t). We see that x < 1 implies a = 1 while x > 1 implies 
0 < a < 1, the equality a = 0 holding if and only if p = 0. In no case can the 
haif-open interval 0 < t < 1 contain more than one root. It is readily seen that 
(2.3) lim pao = lim f,(0) = a. 


n—- 2 
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We thus have the well known result: the number a is the probability of eventual 
extinction of the family. The relation between a and x shows that the probability 
of extinction is 1 if and only if x < 1. 

It is also clear that 0 < ¢ < 1 implies lim f,(/) = a; this, together with (2.3), 
shows that 
(2.4) lim par = 0,r = 1,2,---. 
Relation (2.4) means roughly that the family either dies out or gets very large 
In section 4 it will be shown that (2.4) holds uniformly in r. 

DEFINITION 2.4. The random variables w, are defined by wz = 2,/x". 


Clearly Ew, = 1 and Ew’, = 1+ Z — ¢ — a if z ~ 1. 
Suppose n > m. Then E(2nzm) = Dy DmrE(r2n| 2m = 1) = Qo Dmrt" " = 
a” "Ez. Thus E(wrwm) = Ew%,, ohne 
(2.5) E(wa — Wn) = Ew, — Ew, n> ™m. 
By virtue of (2.5) we obtain 


THEOREM 2.1. Jf x > 1, the random variables w, converge in mean square, 
hence in probability, to a random variable w. 


9 


. ° 2 ~ 
For in this case Ew, — 1 + 7. ~"*- * and (2.5) shows that 


E(w, — wm) > 0asnandm-— x. Theorem 2.1 is then a consequence of [8], 
p. 38, I. 

It is well known that convergence in mean square implies Ew, — Ew’ and 
E(w, — 1)* > E(w — 1)’ whence Ew, > Ew. 
Thus we have 


e 


(2.6) Ew = 1, Ew =1+ “5 


oo = *£ 


In order to study the behavior of z, for large n when x > 1, we consider the 
distribution of w. 


DEFINITIONS 2.5. Gr(u) = P(wn < u); on(s) = E(e*™) = | e” dG,(u). 
DEFINITIONS 2.6. (Applicable when x > 1.) G(u) = P(w < wu); $(s)= 
E(e"’) = | e“ dG(u). We shall refer to G(u) as the asymptotic distribution 
a ; 


branching from f(s). 

The moment-generating functions (m.g.f.’s) ¢,(s) and ¢(s) are defined at least 
for Re (s) < 0. Unless specifically stated otherwise we shall consider them only 
in that domain. 

From (2.2) and the fact that ¢,(s) = f,[e’*"] it follows that ¢n41(sx) = f[on(s)}. 
Theorem 2.1 implies that if x > 1 G,(u) — G(u) and ¢,(s) — $(s) for Re (s) < 0. 
Thus the m.g.f. 6(s) satisfies the functional equation 


(2.7) ¢(sz) = flp(s)], Re (s) < 0. 





nN 
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Equation (2.7), which of course is applicable only when x > 1, was obtained in a 
different form by Ulam and Hawkins. It belongs to a type usually known as 
Koenigs’ equation, after the nineteenth century mathematician who studied it 
in connection with functional iteration, and is related to an equation studied by 
Abel. We shall make some use of the work of Koenigs later. See Hadamard [9] 
and Koenigs [10]. 

We note that Ew" < « if and only if Ezij < «. It was already pointed out 
that Ew = 1. As pointed out in [4], as many further moments of w as exist 
may be found by successive differentiation of (2.7) at s = 0. 

Finally we note that G,(0) = pa. Hence lim G,(0) = a. Thus G(0) = 
P(w = 0) > a. We show later that GO) = a. Clearly G(u) = Oforu < 0. 

In sections 3 and 4 we always assume x > 1. 


3. Asymptotic properties of the moment-generating function. We first 
show that (2.7) uniquely determines the distribution of w. Specifically, 

THEOREM 3.1. Let Gi(u) and G.2(u) be distributions with equal first moments 
and finite second moments whose characteristic functions 9, (it) and ¢2(it) satisfy 
(t is real) (itz) = f[g,(it)],r = 1,2. Then G,(u) = G2(u). 

From [13], p. 27, ¢:(it) — (it) = @B(t), where B(t) is bounded as t —> 0. 
From (2.7), | ¢i(itr) — de(ttx) | = | fldi(ct)] — flde(t)] | < x | gilit) — de(tt) |, 
since | f’(s) | < « when|s| <1. Hence fort ¥ 0, Le) | > «| B(t)|. Thus 
B(t) cannot be bounded near ¢ = 0 unless it is identically zero; hence 


diet) = deo(2t). 


It is clear that the requirement that ¢(s) have the form 1 + s + O(s’) between 
two rays from the origin is sufficient for the uniqueness in that domain of solu- 
tions of (2.7). On the other hand, continuous solutions can be constructed at 
will if the existence of a derivative near s = 0 is not required. 

Before proceeding further, it is convenient to define three functions k(s), 
¥(s), and H(u) which are closely related to f(s), ¢(s), and G(u) respectively. We 
repeat that we are considering only the case x > 1. See definition 2.3 for a. 
f[sQ1 - —- i? 

= ; 
bility generating function with k(0) = 0, k’ (1) = f'(1) = 2, k’"(1)< ©. We 


DeFINITIONS 3.1. Let k(s) = Clearly k(s) is a proba- 


write k(s) = Zz. qrsr. We also define the iterates k,(s) by 
r=1 


Ko(s) = 8, Kasi(s) = klk,(s)]. 


DeFINITIONS 3.2. Let H(u) be the asymptotic distribution branching from k(s) 
(See Definition 2.6.) Let y¥(s) be the corresponding moment-generating func- 
tion. We know then that ¥(s) and k(s) satisfy 


(3.1) ¥(sx) = k{p(s)]. 
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In view of the uniqueness theorem we have, by direct substitution in (3.1), that 
y(s) must be given by 


(3.2) ys) = 21 — as] — a 


l-—a : 


and that H(u) must be given by 


G ( u ) ‘ 
(33) H(u) = u>0; Hu)=0, u<0. 

We shall see later that H(0) = 0; 1.e., that G0) = a. Therefore H(u) is the 
conditional distribution of (1 — a)w, given that w ~ 0. Another way of stating 
this is as follows: 

THEOREM 3.2. The random variable w ts distributed as the product of two inde- 


endent random variables wo: w’, where wo takes the values 0 and with prob- 
l-—a P 


abilities a and 1 — a respectively while w’ has the asymptotic distribution branching 
from k(s). 

For it is directly verifiable that y(s) is the m.g.f. of wo-w’. 

In theorems 3.3 and 3.4 we consider the behavior of ¥(s) for large | s|. To 
make for smoother reading we defer the proofs till section 9, where somewhat 
more general formulations are given. In section 4 the properties of ¥(s) are 
interpreted in terms of G(w). 


| ] 
DeFINITION 3.38. Let y = log. () = log, E my (See definitions 2.3 and 
{1 a) 
3.1.) Ifq =90 (i.e., Po = Pi = 0) wetakey = x. 
THEOREM 3.3. Supposey < ~. Thenzf Re (s) < Oands ¥ 0, 
M(s) 


i 


(3.4) ¥(s) = + Mp(s). 


M(s) is continuous for s ~ 0; M(s) and M)(s) satisfy respectively 
(3.5) M(sx) = M(s); M,(s) = o( : =)* ls|—> 0. 
Remarks. (See section 9for proof.) (a). Under the conditions of the theorem 


M(s) is real and positive when sis real and negative. (b) If Hzi < » and the 
conditions of the theorem hold, the rth derivative of ¥(s) satisfies 


(r ] 
(3.6) \v''(s)| = O (- =) js|—> o, 
(c) If y = ~, ¥(s) and as many derivatives as exist approach 0 exponentially 


as |s|— ©. 
We now consider the behavior of ¥(s) on the positive real axis, provided it is 
defined there. 
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Lema 3.1. Let f(s) be analytic in the circle|s| <a,a> 1. Then ¢(s) and 
y(s) are analytic in some neighborhood of s = 0. 

We use a theorem of Poincaré [11] which insures that there is exactly one 
function ¢(s) analytic near s = 0 with ¢(0) = ¢’(0) = 1 and satisfying 

$(sx) = f[o(s)). 

(Although Poincaré’s proof is for the case f(s) rational, it applies equally well 
here.) The circle of convergence of the MacLaurin series for ¢(s) has radius t. 
where $(t2) = a. An argument whose details are given in [12], p. 21, then 
shows that ¢(s) = $(s) for | s| < t2, and Lemma 3.1 follows. (The argument 
is necessary to rule out the possibility that the ¢,(s) converge to ¢(s) for 
Re (s) < 0 but to some other function for Re (s) > 0.) Clearly ¢(s) and y(s) 
are entire if and only if f(s) is entire. 

Lemma 3.1 is useful for actual computation of G(u). The (non-negative) 
coefficients c, in the series ¢(s) = 1 + s + os° + --- can be determined by 
differentiating (2.7) ats = 0. The series can be used to compute values of the 
characteristic function ¢(zt) on some interval t) < t < tox, where t is a small real 
number; the values of (zt) for the remaining values of t are determined by (2.7). 
(Note that the real and imaginary parts of ¢(it) are respectively even and odd.) 
Then the usual inversion formula is used to obtain G(u). A numerical example 
of this procedure is worked out in section 5. 

DEFINITION 3.4. The number p is defined by p = log.d if f(s) is a polynomial 
of degree d, p = ~ otherwise. 


THEOREM 3.4. Let f(s) (and hence k(s)) be a polynomial of degreed. Then 
fors>0 


nee = L(s) + Ls); 


L(s) is continuous and positive; L(s) and Lo(s) satisfy respectively 
L(sx) = L(s); Lo(s) = O (5). 3—> 2, 


The proof is in section 9. (Theorem 3.4 may be compared with a more widely 
applicable but less precise result due to Shah [19].) 

Coro.tuary. If f(s) is a polynomial of degree d, y(s) is an entire function of 
order p and type C where C = Max L(s),1<s< az. 

An explicit determination for C has not been found. An approximate numeri- 


cal determination is not difficult; the function L(s) = lim log kalv{s)I can be 
5 


n—s00 
determined numerically for a number of values on some convenient interval 
8 < s < sox, and the maximum value approximated. The importance of C 
will be indicated in the conjecture following Theorem 4.3. We may also men- 
tion that the quantity [Max L(s) — Min L(s)], 1 < s < 2, is of some interest. 
Some numerical work indicates that in certain cases L(s) is at least approxi- 
mately constant. 
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4. Some properties of G(wu). Since it will be convenient to work with H(u) 
rather than G(u), we state the content of Theorems 4.1, 4.2, and 4.3 in terms 


of G(u): Gu) = a+ | g(v) dv for u > 0. The density g(wz) is continuous for 
0 


uO. If Ez < « theng’(u) is continuous for u + 0 provided r <y +h — 1 
and is continuous for uv = 0 provided r < y — 1. Near u = 0, G(u), provided 
y < ®, approximates, in a certain mean sense made clear by Theorem 4.2, the 
(1 — a)” 
r(1 + y) 
M(u) for positive uby M(u) = M(—u). Itis then shown that in a certain sense 
g(u) goes to zero faster than exp (—wu°‘) and slower than exp (—u°"*) where ¢ is 
any positive number, Q being defined in Theorem 4.3. A conjecture is given of a 
more precise result, applicable when f(s) is a polynomial: in the same sense g(w) 
goes to zero (more, less) rapidly than (exp [— (A* — e)u*], exp [—(A* + ©)u®]), 
where A* is defined in the conjecture. 

DEFINITION 4.1. Let H’(u) = h(u). 

THEOREM 4.1. H(u) is absolutely continuous. Theorem 3.3 shows that H(u) 
is continuous; see [13], p. 25. This incidentally shows that G(0) = a. If 
y > 34 the absolute continuity of H(u) follows from the Plancherel theorem. 
See any text on Fourier transforms. In any case, define the functions 


function a + u*M|u(1 — a)], where for convenience we have defined 


hn(u) = = e'“W(it) dt, m2=1,2,---. 
wT J—m 


An integration by parts’ gives for u ¥ 0 
—1 
2riu 


(4.1) hn(u) = [Wiim)e '”""” — y(—im)e™"] + ' | eo dy(it) dt. 


Qriu dt 
If0 <wm<u< w, (4.1), (3.4), and (3.6) show that the continuous functions 
hn(u) converge uniformly in [i , w] to a continuous function h(u). Moreover 


; m er rx e 1) 
lim | =e — Wit) dt 


m—+o v—m Qrit 


I 


H(w) — H(u) 
(4.2) 


I 


lim [ hunlu) du = / “ h(u) du, 


Mm—a~N © My 


the first equality in (4.2) following from [13], p. 28 and the second from the fact 
that the h,,(u) are uniformly bounded for w4 < uw < uw. In case Ez < « 
and r < y + k — 1, repeated integration by parts of (4.1) and reference to 
remark (b), Theorem (3.3), shows that the first r derivatives of h(w) are con- 
tinuous if uw ~ 0. The usual integral expression for h(u) in terms of ¥(zt) shows 
that y > r + 1 implies h“” (w) is continuous at 0. 


2 IT am indebted to J. W. Tukey for this suggestion, which simplifies the original proof. 
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COROLLARY TO THE CONTINUITY OF H(u): the numbers par = P(z, = r) —0 
uniformly inr,r > 1,asn-—> x. Wehave 


Pur = |. (“) —G (4)| + | (4) 
a = av 
=f ( - ‘| * | « F — ') — G, (: on ar 
x n r % xr n x n 


The desired result follows because G,,(u) — G(u) uniformly for u > 0 and because 
G(u) must be uniformly continuous for 0 < u < « (right-continuity at 0). 

We next consider the behavior of H(u) near u = 0, wheny < «©. Theorem 
3.3 suggests what sort of result may be expected. If the function M(s) of 
Theorem 3.3 were a constant .V/ it would follow from a Tauberian theorem due 


Mur 
to Karamata (see [14], pp. 189-192) that H(u) ~ Ty 1 1) as u — 04+, or 
H(u) M . . a 
v1 ™~ a7 ay. «Integrating both sides of this relation from u to ux would 
u ul'(y + 1) 
give 
(43) “" H(v) dv oe e [ M dv 
u prt I (y + 1) 1 v 
The analogue of (4.3) turns out to be true, as shown by Theorem 4.2, which 
ur M(u 
shows that in a certain mean sense, H(u) behaves like = ; () as u — O+. 
(iy + 1) 
(We defined M(u) = M(—x) for u > 0.) 
THEOREM 4.2. 
. a H(v) dv l [ M(v) dv 
Lim = ~ - 
u—-0+ Yu prt I (y + 1) 1 v 


The proof, which follows directly along the lines of the proof of Karamata’s 
theorem, is sketched briefly in section 9, for a somewhat more general situation. 

A corollary of Theorem 4.2 is that if y < 1, h(w) cannot be bounded as u — 0+ ; 
for h(w) < K implies 


s “" K+ vdv l * M(v) , _ 
lim [ > = | —a>o @w ime ’ > 6, 
u—04 Yu 9" (y + l ) 1 U : 


which implies y > 1. An example to be given in section 5 shows that if y = 1 
h(u) is at least in certain cases bounded but discontinuous at 0. 

In order to consider the behavior of H(u) as u — ~ we first prove a theorem 
which applies to any distribution whose m.g.f. is an entire function. 

TuHeoreM 4.3.2. Let F(u) be any c.d.f. whose m.g.f. &(s) is entire. Let pbethe 
order of &(s). Let Q be defined by 


u—+0+ 


? 


Q = l.u.b. a: | ell dF(u) < x. 


3 Before completing the present proof, the writer communicated this result to R. P.Boas, 
Jr., who sent back a proof along different lines. 
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| ] 
Then- + — = 
p Q 
The proof is given in section 9. 
Combining Theorems 3.4 and 4.3, we obtain immediately 
THeorEeM 4.4. Let Q = Lub. q: [ e“h(u) du < «©. Then Q = — : 
0 >= 
Here p is given by definition 3.4. If f(s) is not a polynomial, whether entire 
or not, the proof of theorem 4.3 will show that Q = 1, and we interpret theorem 
4.4in that sense. The trivial case f(s) = s* is excluded, so p > 1. 
ConsecturRE. Let &(s) of theorem 4.3 be of finite order p and of type C, 


0<C<o. LltQ= a 1 and let A = l.u.b. A’: | ef "'9 OP (u) < &, 


Then (Cp)®-(AQ)’? = 1. 

The proof for the case p rational follows the same lines as the proof of Theorem 
4.3; a general proof has not been found. If the conjecture is true then having 
determined p and Q, when /(s) is a polynomial, and having estimated C by the 
procedure indicated following the corollary to theorem 3.4, we obtain 


1 f 1 \w—» 
4, Auntie 
~ 0 A) 


for the ].u.b. of the numbers A’ such that [ of" h(u) du < «. The cor- 
0 


responding number A* which applies to g(u) is given by 
(4.5) A* = A(1 — a)®. 


. Some special cases. In this section we shall discuss some special cases in 
which the m.g.f. ¢(s) and the c.d.f. G(u) may be determined explicitly. For 
these cases and for certain others there is a close relationship between the simple 
discrete branching process and another type of model to be discussed in section 8. 
Finally a numerical computation of the distribution G(x) will be given for a 
particular case where f(s) is a second degree polynomial. 

Suppose f(s) has the form 


2 z 1 
Oe t=2 4S 
is) 2+2(1,) 
with « > 1,a > x — 1, where f’(1) = wand f’(1) + f’(1) = Ez} = x(1 + 2a). 
It is easily verified (as pointed out by Poincaré in [11]), that the solution of the 


= 1 s 
equation ¢(sz) = f[¢(s)] is given by ¢(s) = 1 + Pe \ Me. with o(0) = ¢’(0) = 1. 
" oe ike at+l—z «za " 
The number a satisfying a = f(a) is given by a = . The functions 
a 
¥(s) and k(s) of section 4 are given by y(s) = l—.s? k(s) = c-~ oe 





1€ 
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The number y of Theorem 3.3 is 1. The density function h(u) (definition 4.1) 
is simply e “, as seen by direct calculation. The number Q of Theorem 4.3 is 1, 
as it should be, since f(s) is not an entire function. The c.d.f. H(u) is1 — e, 


and H(u) ~ wu near u = QO, in agreement with Theorem 4.2. Various aspects 
As+B 
of the case f(s) = Cs + D have been discussed by numerous authors. 


Somewhat more generally, we may consider generating functions of the form 
(5.1.) k(s) = s{x — (x — 1)s"7-"", s > 1. 


The function /:(s) is a generating function if and only if m is a non-negative 


integer. In this case we have ¢(s) = ¥(s) = (1 — ms)” and g(u) = h({u) = 
l 


(mn )T ( ) 
m 


density function h(u) is unbounded near u = 0. A physical interpretation for 
this case will be given in section 8. 

As a numerical illustration we consider the case f(s) = 0.4s + 0.68". We 
have x = Ez, = 1.6ando = E(z — x) = 0.24. For the asymptotic distribu- 


(i/m)—1 —(u/m) 1 
” eae ieee Here y = ~ , and we note that unless m = 1 the 
m 


~ 


tion, Ew = 1, E(w ae 1)° = a = 0.25. The number y = logis (4) = 


1.9495 so that ¥(s) which is identical with ¢(s) in this case, is o( 73m) as 


| $| goes to © with Re (s) < 0. This implies that the c.d.f. H(u) and likewise 
G(u), since the two are equal here, behaves like [1/IT'(1 + y)]M(u) times u'**” 
near u = 0, where the “behavior” is in the sense of Theorem 4.2. Numerical 
determination of M(w) would not be difficult. The number p of Theore : 4.4 
is given by log. 2 = 1.4748. This means that ¥(s) is an entire function of order 
1.4748 and hence that the density function h(w) goes to zero more rapidly than 
—yQ-¢ p 


°*** for any « > 0, where Q = — 3.1061, 


u 


and less rapidly than e— 
and ‘more rapidly”’ is used in the sense of Theorem 4.4. 


‘ . | ” 
The function L(s) = lim —_ ) was computed for four values of s between 


s = lands = x = 1.6; in each case the value was 0.744625 so that it appears 
likely that here L(s) is constant. Hence C = Max L(s) = 0.744625 and the 
quantity A defined by (4.4) is 0.26430. Thus the conjecture following theorem 


@ 
indi 0. 744625 3.1061 ‘ ‘ 
+4 indicates that | g(u)e = du is (divergent, convergent) accord- 
0 


ing as the + or — sign holds. 

Through the kindness of Mr. Cecil Hastings of the Douglas Aircraft Company, 
the c.d.f. G(w) was computed for this case. The coefficients in the power series 
expansion of ¢(s) were obtained from the functional equation (3.1) and G(x) 
was then obtained by inverting ¢(it). The values of G(w) are given in Table I. 
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6. Number of generations to extinction. It was pointed out in section 2 that 
when x < 1 the probability is 1 that z, = 0 for some integer n. We assume 
through-out section 6 that x < 1. 


TABLE I 
G(u), the limiting probability that z,/x" < wu for the case f(s) = 0.4s + 0.6s° 
u G(u) 
0.00 .00000 
0.25 .04753 
0.50 17275 
0.75 34550 
1.00 .53117 
1.25 . 69932 
1.50 . 83042 
1.75 91857 
2.00 96781 
2.50 99751 
3.00 99993 


DEFINITIONS 6.1. Let the random variable \V be the smallest integer n such 
that 2n4: = 0. Define the moment-generating function of NV by 


xz 


A(s) = 2. e’P(N = n). 


n=0 


wo 


Clearly P(N = n) = pnsio — Pro, SO that 6(s) = >. e™ (paste — Dro). 
DEFINITIONS 6.2. Let b, = 1 — pasio, With bo = 1 — po. The numbers 6, 
satisfy the recursive relation 


(6.1) hus 2 =~ Mi ~~ &) 


Define the function 6;(s) by 


We see that 
(6.2) 6(s) = 1 + (e* — 1)8,(s), 


so that it suffices to determine the function @,(s). 

The function 6,(s) belongs to a type which has been studied by Fatou [15] 
and Lattés [16]. If we let e* = z we see that 6,;(z) is a power series whose coeff- 
cients are successive iterates of the function f*(b) = 1 — fl — b); 1e., basi = 
f*(b.) = frai(bo), where f*(0) = 0, f#/(0) = x < 1. It was shown by Fatou 


hat 
dme 


6s" 


uch 


[15] 
offi- 


tou 
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that a function of this sort is meromorphic with poles at s = —n-logxz,n = 
1,2,---. Amexpansion for 6;(s) in the form 


fe) = —** Yo 4 He yo 4 _ Ms yo so 
i— ze* l — x*e° 1 — ze 
was obtained by Lattés, the expansion converging everywhere except at the 
poles. The quantities 4, and jo are defined as follows: the function p(s) = 
iS + pos + uss? + --- is determined by the functional equation u(sr) = f*[u(s)] 
with the condition u’/(1) = wu, = 1. The number y is determined by u(y) = 
bo = 1 — po. Perhaps the easiest way to determine yo is to use the fact that 
the inverse function w'(s) satisfies the functional equation y'[f*(s)] = ap "(s), 
from which we can determine the power series for » (Do). 
Since the use of Lattés’ expansion requires finding the expansions of u(s) and 
u (s), we now give another method, giving a different kind of expansion; this 
method appears particularly adapted to the case here illustrated, where f(s) 
is of the second degree. Then (6.1) becomes 


(6.3) Ont = T0n — Pobr, bo = 1— po. 
DEFINITION 6.3. The functions 6;(s), k = 1, 2, --- , are given by 
(6.4) A.(s) = D> (b,)*e™. 
n=0 


If we raise both sides of (6.3) to the th power, multiply both sides by e"’, sum 
on n from 0 to ~, and solve for @;.(s), we obtain 


k 

ay — fi}; ees 

bye +2 (‘) (po x Onis(8). 
(6.5) ED ttt. nccenennnesimaealpeannrnasinn mai 


e* — xk 


(Justification for the rearrangement of series will come out of the subsequent 
proof.) If we put k = 1 in (6.5) we obtain 


; ; 0€. — Pobd(s). 
(6.6) «A _ 


eo == 2 


DEFINITIONS 6.4. We define recursively sequences of functions S,(s) and R,(s), 
such that for each n, ,(s) = S,(s) + Ra(s). Let 

° in 

iis os boe Rian = —- 

o* = een ae 
Suppose now that R,(s) is of the form An6,41(8) + +--+ + AnnOon(s), the An; 
being functions of s, pe , and x, but not explicitly of bo ; while S,(s) is a rational 
function of e~*, pe, and x, and a polynomial of degree n in bh). Now put 
k = n+ 1 in (6.5) and substitute the expression obtained for 6,4:(s) into R,,(s). 
Collecting terms we now define F?,,.;(s) as the sum of terms involving @,42(s), --- , 
Bonzo(8): Raga(s) = Angi aOnge(s) - -++ + Ansins19enge(s); then S,i1(s) = 
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A:(s) — Ra4i(s) is a rational function of e~*, p., and x, and a polynomial of 
degree n + 1 in bo. 
TuHEoreM 6.1. Let f(s) = po + piS + pos, with x < 1. Suppose that 
x + pobo <1. Then the yunctions S,(s) converge to 0,(s) in a neighborhood of.s = 0. 
The restriction x + pob) < 1 may fail to hold. However this is not a serious 
restriction; we pick a value of n so that x + pob, <1. Then 


0:(s) = bo +--+ + baye”* + e063 (s), 


where 6;(s) = >, b;e” is the same type of function as 6;(s); theorem 6.1 


j=n 
is then applicable to 6; (s). 
If the conditions of theorem 6.1 are satisfied, we have 


A:(s) = boe “[mi(s, x) — pobome(s, x) + 2xp2bo7(s, x) 
— pabo(e* + 52x°)m(s, x) + +++ 


(6.7) 


where 7;(s, z) = I [e*-9): Since E(N) = 0’(0) = @,(0) and E(N*) = 
6’’"(0) = 26;(0) + 6,(0), we have 
E(N) = boim(O, 2) — pebo (0, x) + 2xp3b5 730, x) 
— p2by(1 + d52x°)ra(0, x) + +++], 
E(N’) = —E(N) + 2bo[ri(0, x) — pobox2(0, x) 
+ 2rp3bor3(0, x) — (5a*® + 1)p2b57,(0, x) 
+ pobom(0, x) + ++ -] 


k 
’ : - l 
where 7;(0,r) = 7;(0, x) 7 


ae | l-—2 


We now prove that if 2 + peabo < 1, the expansion (6.7) is valid in some neigh- 
borhood of s = 0. We shall denote the particular values of x, po, and bp with 
which we are dealing by Z, p2, and bb. Now let x, pz, and by be three complex 
numbers, arbitrary except for the following restrictions: 


(6.8) la] + |p| <1, lbo| <1 


and define the numbers b, in terms of ly , x, and pe, by means of (6.3), with 
6,(s) defined by (6.4). 

We first show that (6.7) is valid if (6.8) holds, and then show that the domain 
of validity also includes the original numbers 2, j2 , and bp , provided 


E+ poo < 1. 
If (6.8) is satisfied, we have | b,| < A|2a|" where A is a positive constant. 


Now suppose 1 < T < : . Then the series defining @,(s), k = 1, 2,---, are 





ith 


un 


nt. 


BRANCHING PROCESSES 487 


uniformly and absolutely convergent in the domain | e*| < 7. Moreover, if 


|2| + |p2| = A < 1, we have |b, | < boA” whence, if k is an integer large 
enough so that TA* < 3, 
(6.9) |@x(s) | < 2bo 


for |e*| < T. In what follows, we assume |e*,| < T. Now write 6,(s) = 


S,(s) + a Anj(p2, X, $)On4;(8), where n is large enough so that TA” < 4 


2 . 


Let A in. z, 8) = bere | Anj(p2, 2, 8)|. Passing to the next stage we see 
T An! er Ant 
that Ana: < An t+ a a < A,(1l+ oa) Hence the numbers 


{, are bounded. This fact, together with (6. 9). shows that lim R,(s) = 0. 


Now suppose that x and by have their original values Z and by while p» is small 
enough in absolute value so that Z + | pe | <1. In this case lim S,(s) = 6,(s). 


We observe that S,(s) is a polynomial of degree n — 1 in pe and that S,4:(s) is 
obtained from S,(s) by adding a single term of degree n in p.. Thus @,(s) has 
been expressed as a power series in p2. Now consider 6;(s) as a function of pr , 
with bb) = bb, 2 = & If Z + bo|p2| < 1, we have b, = O[(%)"]. Thus 4(s) 


— 
is analytic in p. for | po| < _ and the expansion in (6.7), being a power 
0 


series in p2 , must be valid when Z + jbo < 1. 


7. Estimation of parameters. Until now we have assumed that the param- 
eters p, are known numbers. We may wish, however, to estimate them, having 
observed the numbers 2; , 22, °-+ , 2n41- In order to get simple maximum like- 
lihood estimates for the p, , it appears necessary to introduce certain auxiliary 
random variables. 

DEFINITIONS 7.1. Let Zmx be the number of individuals in the mth generation 
who have exactly /: descendents in the (m + 1)st generation. Let Z, = 
+a + +++ + fe. 

THEOREM 7.1. Maximum likelihood estimates of p, and x, based on observed values 
of Zmx for m < n, are respectively, 


= z. Sael Bes x =(Zn41 po 1)/Zn. 


™M=0 


(Note that the estimate < involves only 2, +++ , Zn41-) 
If zm is fixed the joint conditional probability function of Zmo, Zmi,°** , 18 


| (en)! Il ro jf II (emr)!. Thus the joint probability function of the zm, for 
r=0 r=0 
m=0,1,---,n,andr = 0, 1, 2, --- , is given by the product of two factors, 


one of which is independent of the p, , the logarithm of the other being > (De Zmr) * 


r=0 m=0 
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log p,. The value of this expression is clearly maximized by taking p, = 4, 
as given above. Since deems = Z, and > remr = Zmi1, the quantity > rp, gives 


r 


¢ as above. 

Although the estimates p, are the same as we would obtain if we were dealing 
with Z, trials from a multinomial distribution with probabilities p, , the joint 
distribution of the quantities >> zmr, 7 = 0, 1, +--+, is not multinomial. For 


m==0 


example, i Z, > 1 the probability of the event 


(on 


122m = = Z,, Pe = Oforr ¥ 0} is is 
m=0 m=0 


We shall next show that the estimate Z is, in a certain sense, consistent. 
THEOREM 7.2. If x > 1, the — variables Zn+1/Zn converge in probability 


to the random variable xV* where V* = - iy w = Oand V* = lifw #0. 
B rs 


If w ¥ 0 then for all n, z,  Oand 1/Z, ~OQasn — =x. Hence in this case 
(Zni1 — 1)/Z, converges to x if Zn4:1/Z, does. On the other hand, P(w = 0) 
= a = P(z, = 0) for some n, so that if w = 0, Zn41/Z, = 1 with probability 1 for 
n large enough. Thus we need only show that Z,.:/Z, converges to x if x > 1 
and w ¥ 0. 

We need the following: 

Lemma 7.1. Jf x > 1, the random variables Z,,/x" converge in probability to 

wx 
z—1' 

Since 


” wer Zn Ww (w — w Wr) 
(7.1) 2-2 (Ht ) + > & 


r- 1 x" grt ra ye 


zs ¥ 1 : 
it will be sufficient to show that lim — mre E(w’) = 0 and lim 


nO 1 n—»0O 


a(S ~ oS iy = 0. The truth of the first statement is obvious, since Ew” 
r=0 

is finite. It follows from (2.5) that E(w,w.) = Ew; if s > r, E(ww,) = lim 
E(w,w,) = Ew? , whence E(w — w,)* = -3———~ and E[(w — w,)(w — w.] = 


(a — x)a 


9 


apne se >y. Be 


(2” — x)x 
(w — : © “ a — 
a(e ©: caer i = \+ +22, |, 


and this quantity clearly approaches 0 as n — ~, proving Lemma 7.1. 
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Define the random variables w* and V, as 


w* = w when w #0 

wt = 1 when w = 0 

Vv. = Zn when 2z, + 0 
xz* 

VY. = —S when 2z, = 0. 


x 
It is clear that the V, converge in probability to w* cn and we note that the 


c.d.f. of w* is continuous at w* = 0. Hence, 


nae 


lim P ( Vines - ]|/>e> 0) = lim P(Vast sa Ve te Vi€ S$ 0) 


VY; nrn—-O 
* 
“ 2 $ 0) a 


cT=— 


It follows, under the conditional hypothesis w =~ 0, that the variates Zn * con- 


4n 


verge in probability to 2, since 


Zin+i V nga 
, =X when Zay, ~ 0. 
lia Fs . 


8. Continuous models. As mentioned in section 1 there are situations where 
it is more important to consider the number of individuals existing at a given 
time than the number in a given generation. Let a set of probabilities p, be 
given. The question arises whether we can interpret these as probabilities that 
an individual will have a given number of descendents at the end of some fixed 
period of time. We might then suppose that each individual in existence at 
that time has the same probabilities of having a given number of descendents at 
the end of the next (equal) length of time, these probabilities being independent 
of the age of the individual. A model of this sort might be considered in certain 
fission processes, if the probability of fission is independent of age. It should 
be noted that the ‘‘descendents” of an individual may include the individual. 
For example, if a bacterium splits in two we may either regard it as having pro- 
duced two descendents and dying, or as having produced one descendent and 
itself surviving. 

If an interpretation of this sort is to be satisfactory, interpolation in time must 
be possible. In other words there should exist a family of functions f,,(s) defined 
for all positive n such that f,,[fn.(s)] = fa,+n.(8)3 such that for each positive n, 


f.(s) is a probability generating function, f,(s) = >_p,(n)s"; and such that for 
r=0 
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n = 0,1, 2, --- the functions /,,(s) coincide with the iterates s, f(s), f[f(s)], --- 
We may then interpret f,(s) as the generating function at time n. It is readily 
seen that in general such a family of functions will not exist. For example, if 
such a family exists we must have f(s) = nth iterate of fi/n(s) for arbitrarily large 
integral n, so that f(s) cannot be a polynomial of degree > 2. 

The functional equation ¢(sr) = f{(s)] shows that f(s) = ¢[x@ ‘(s)], whence 
fr(s) = $[x"@ '(s)] for integral n. The expression ¢[z"¢ (s)] then might be 
taken as the definition of f,(s) for all positive n. See Hadamard, [9]. The prob- 
lem of determining whether the functions so defined are a family of generating 
functions will be discussed in a subsequent paper. We remark, however, that 


, s ‘ ‘ . ‘ 
if f(s) has the form Pe considered in section 5 then the iterates f,(s) 


(x — 1)s 
8 
have the form a on (x” Ds ; they are clearly generating functions for all posi- 
tive n, satisfying the required relation fr,(fn.) = fn, + n»- Now suppose g(s) 
gs) 


is some function such that the function f(s) = g isa generat- 


z— (z — Ig) 
ing function for all z > 1, withg(1) = 1. As pointed out by Ulam and Hawkins, 
the iterates of functions 7 (s) of this form are convenient to work with, the nth 
g(s) 
—(e" — 1)g(s)} 
f(s) be a generating se aes for all x > 1 shows that the functions f,(s) are 
generating functions for alln > 0. Thesimplest function g(s) which satisfies our 
requirements is g(s) = s”, where m is any positive integer. In this case f(s) 
has the form ——" in (5.1) and f,(s) = s[z” — (x" —1)s")"""".. Asn 0 
n log x ont 
m 





iterate being simply g~ In addition, the requirement that 


we have f,(s) = a-5 _ m 18 z)s+ + O(n’). Wemay interpret this 


as follows. <A Py in existence at a given time may, in a short time interval 


At, either split into m + 1 particles, with probability =e, ; or it may remain 


At log x 
vee 


unaltered, with probability 1 — If it splits, each particle produced 


has the same chances for splitting as its parent, etc. Thus, from the results of 
section 5, it follows that if we begin with a single particle at time t = 0, the 


asymptotic probability density function for z,/x‘, where z; is the number of 
é —l/m m—1 —u/m 1 

particles at time ¢, is given by (m7 "ul!" e"“’”)/T a 

It is, of course, customary to begin with the elementary probabilities for a 
certain number of births in a short time At and determine the functions f,,(s) 
from these by means of differential equations. See, for example, Arley, [17]. 
The results of the present paper can be applied in some cases to the continuous 
problem even when an explicit determination of the f,(s) is difficult. A discus- 
sion will be given in a later paper. 
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9. Some proofs. We give in this section proofs for (A) theorem 3.3, (B) 
theorem 3.4, (C) theorem 4.2, and (D) theorem 4.3; in certain cases we shall 
indicate slightly more general results. 

(A) We make use of a result of Koenigs, in the form applicable here. 

KoEniGs’ THEOREM: If |s| < X < 1 and q # O, then ka(s) = q{B(s)- 
[1 + O(qi)] where B(s) is analytic for |s| < d and satisfies the functional equation 
Bik(s)] = qB(s). 

Here, O(q; ) means bounded by Aq; , where A is independent of s. We remark 
that B(s) #0. The proof of Koenigs’ theorem follows readily if we write k,(s) = 


n—1l 
oi *k(s) TT { + =, where ¢(s) = “© — q. 
i 
Now let ¢; be a positive number such that | ¥(s) | < 1 when0O < |s| < t, and 
Re(s) < 0. (For the rest of this proof we assume Re(s) < 0.) Such a number 
exists; on the imaginary axis we have y(it) = 1 + it — 4E[(w’)’| + o(¢t’) where 
E|(w’)’] > 1, w’ having the distribution branching from k(s), showing that 
| y(t) | < 1 if ¢ ¥ O and sufficiently small; while if Re(s) < 0 we refer to the 


expression ¥(s) = [ erat. Let X = Max|y(s) | for t/r < |s| < t. 
0 


If |s| > t: let N(s) be the smallest integer such that | s |/x”® < t. Then 
¥(s) = kwelb(s/z*)] = gf Bly (s/2* JIL. + OG!) =Biy(s)][1 + O@?)). 
Now B(y(sz)] = mBly(s)]. Let M(s) = | s|’Bly(s)]. Then M(sx) = M(s). 
Also log. | s/t: | < N(s) < 1 + log.| s/t: |, and theorem 3.3 follows. Clearly 
M(s)/|s |” is continuous for tz < | s| < t,, and hence, by functional continua- 
tion, wherever Re(s) < 0, s ¥ 0. 

Concerning the remarks following Theorem 3.3 we have the following: 

(a) If Ezi < , r-fold differentiation of ¥(sx") = k,[y(s)] gives, for | s| > 
t, > 0, 


(9.1) ys) = 2 Ye Qik? E (*)|, 


al j= a* 





where Q,; is a polynomial in ¥°(5), ee ¥°(5). Now | k,(s) | = O(q?) 


when |s| < A; because of analyticity, the same must be true of | k{?(s) |. 
Put n = N(s)in (9.1), N(s) being the integer defined above. Since ky’ [y(s/x™)] = 
O(qi) = O((1/ | s |”)), remark (a) follows. 

(b) B(s) is clearly > 0 when s > 0; hence M(s) > 0 whens <0. Since B(O) = 
0, B(s) ¥ 0 for sufficiently small s ¥ 0; since ¥(s) ~0as|s|— ©, M(s) ¥ 0 
for | s | sufficiently large; since M(sr) = M(s), remark (b) follows. 

(c) Ify = ~,i.e., gq: = 0, then k,(s) goes to zero with great rapidity asn — ~, 
if |s| < 1. The general line of argument is clear. 

(B) Let k(s) be a polynomial of degree d > 1 with real coefficients, k(s) = 
got: +--+ + qas", with a non-negative double point, k(a) = a > 0, and such that 
k(s) > swhens > a. Lety(s) be any solution of the functional equation y(ms) = 














492 T. E. HARRIS 


k\y(s)] which is continuous for s > 0 and satisfies y(s) > a for s > 0; here m is any 
number > 1. Then theorem 3.4 holds, with x replaced by m. 
It is not difficult to show that if a < s; < s < %, limk;(s) = © uniformly in s, 
in” 


al 


i< 
Hence ¥(s) ~ ~ ass > «. Write R(s) = log (: + te) Then ad". 


log ¥(sm") = d" log kaly(s)] = (1 — d") log qa/(d — 1) + log Y(s) + y 


d ’R(k;-:[p(s)]), s being taken large enough so that R(k;-:[¥(s)]) is continuous, 
Thus, since the functions R(k;:|y(s)]) are bounded, the functions d " log ¥(sm”) 
converge uniformly, for s sufficiently large, to a continuous function L*(s) satis- 
fying L*(ms) = dL*(s). Let L(s) = t °L*(s), where p = log, d. Theorem 3.4 
now follows by an argument similar to that used to conclude theorem 3.3. 


(Note that >> d?R(k;.[v(s)]) = O(d~”)). 


(C) In order to avoid negative signs we work with the Laplace transform in- 
stead of the m.g.f. 


Let H(u) be nondecreasing on (0, ~) with H(O) = 0; let ¥(s) = [ e *’ dH(u) 
0 


be finite fors >0. Suppose ¥(s) = = (4 ) ass— «,where0 <y < a, 

M(s) is continuous and satisfies M = = M(s) for s > 0, x being some number 
: is H(v) [= _ 

~ k Then kn —* dv = r hy: = 1) dv. 


Following the lines of the proof of i saeaied s theorem, we see that for any 


Oe asi * M(s) . we 
y > 0, | s’ W(s)ds = D+ o(1)ass— © where D = ms ds; 1.€., . 
y 1 $ y 
oo ry © 
ds [ e “ dH(u) = D + o(1), or replacing s by (n + 1)s, | ?" ds | er. 
/0 y 0 


p> 
dH(u) = D/(n+ 1)" + o(1) = a ee “sds + 0(1). It follows as in 
0 


[14], pp. 189-192, that if F(w) is any function of bounded variation in (0, 1) we 
have 


(9.2) lim / s ‘ds [ e *“ F(e*") dH(u) = ” [ e *F(s*)s’’ ds 

yoo dy 0 I (y) 0 
Let F(e~*) = e* if 0 < s < 1 and 0 otherwise. Then the theorem follows from 
(9.2). 

(D) Theorem 4.3 is true if F(u) is any bounded monotone increasing function. 
For simplicity we assume that F(1) = 0; it is readily seen that this causes no 
loss in generality. The proof is given for the case 1 < p < ™; it will be clear 
that p = 1 implies Q = ~, while if p = © (or if &(s) is not entire) Q = 

Suppose m and n are positive integers such that m/n < p/(p — 1). Then 
(9.3) [ exp (w™") dF(u) = > y{ 4 (rl) GF(u) <1 E [(r + 1m! 
1 wn ty THOT. I =0 


(rn)! 





C(rti)m 
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(0) 


where & = a 


; interchange of integration and summation are justified by 


_ . n 1 
the positiveness of all terms involved. Suppose 0 < « < a ( - :) ; for k 
p 


sufficiently large the inequality c, < k““"/”~®° is satisfied; see [18], p. 253. 
Hence using Stirling’s formula, we see that the last series in (9.3) is dominated 
by a series whose rth term, for r sufficiently large, is controlled by the factor 


1—(1/p)+e—(n/m)) ‘ 1 n. : ‘ 
pa Mire saim™) =Sincel — - + ¢€— m3 negative, the series, and hence the 
p 


integral, converges. We have thus proved 7 + : «< 1. 
p 


m—1 oo 
Conversely, suppose ” > : ag . Let és) = >» £.(s), where &(s) = > Crsrm* 
a k=0 r=0 


*"" k =0,1,---,m—1. At least one of the functions &(s) must be of order 
p. We suppose that £(s) is; if not the argument would need only slight modi- 
fications. We have 


weer min — (rm)! Com 
(9.4) [ exp (u”") dF(u) > n dX ee Dall" 


rm(1/p+e) 


1 
Suppose 0 <e <1 — _ = From [18], p. 253, the inequality c,m > (rm) 
must hold for infinitely many values of r._ As in the first half of the proof this 


1 1 
shows that the series and the integral in (9.4) diverge. Thus i + Q > land 


the proof is complete. 

If p is rational, the conjecture following theorem 4.3 can be proved in a similar 
manner making use of a relation between the class of an entire function and the 
coefficients of its series expansion; see [14], p. 95. 
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MOST POWERFUL TESTS OF COMPOSITE HYPOTHESES. I. NORMAL 


DISTRIBUTIONS 


By E. L. LEHMANN AND C. STEIN 


University of California, Berkeley 


Summary. For testing a composite hypothesis, critical regions are deter- 
mined which are most powerful against a particular alternative at a given level 
of significance. Here a region is said to have level of significance « if the proba- 
bility of the region under the hypothesis tested is bounded above by e. These 
problems have been considered by Neyman, Pearson and others, subject to the 
condition that the critical region be similar. In testing the hypothesis specify- 
ing the value of the variance of a normal distribution with unknown mean against 
an alternative with larger variance, and in some other problems, the best similar 
region is also most powerful in the sense of this paper. However, in the analo- 
gous problem when the variance under the alternative hypothesis is less than 
that under the hypothesis tested, in the case of Student’s hypothesis when the 
level of significance is less than 4, and in some other cases, the best similar region 
is not most powerful in the sense of this paper. There exist most powerful tests 
which are quite good against certain alternatives in some cases where no proper 
similar region exists. These results indicate that in some practical cases the 
standard test is not best if the class of alternatives is sufficiently restricted. 


1. Introduction. The problem to be discussed in this paper is that of testing 
a composite hypothesis against a simple alternative. More specifically let = 
{f} be a family of probability density functions defined over a Euclidean space R, 
and let g be a probability density function not in. We wish to test the hypoth- 
esis Hy that the random variable X = (X,,--- , Xx) is distributed according 
to a density f of & against the alternative H, that X is distributed according to 
g. By atest we mean a region of rejection, w in R,. 

Neyman and Pearson, in the fundamental paper [1] which laid the groundwork 
of the theory of optimum tests, restricted their considerations to similar regions. 
They considered a region (set) w to be optimum for the given level of significance 
e if it maximizes the power 


(1) [ g(x) dx 
subject to the restriction 
(2) [ f(z)dx =e forall fin’. 


As Neyman, Wald and others have pointed out, it is more natural to replace 
the condition of similarity (2) by the weaker restriction 


(3) / fiz) dx <e forall finS. 
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A region w maximizing (1) subject to (3) is called most powerful against the alter- 
native g at the level of significance. Here and throughout the paper, all fune- 
tions and sets are assumed to be Borel measurable. 

In the present paper we shall consider certain composite hypotheses, and derive 
tests for them which are most powerful against a simple alternative. For the 
cases in which these tests coincide with the standard similar regions it will thus 
be established that no further increase in power is possible with tests of fixed 
sample sizes. In the more usual situation where the most powerful test depends 
strongly on the specific alternative chosen, no such absolute justification of the 
standard test is possible. In these cases, any justification must take account 
of the fact that it is desired to obtain good power against a large class of alterna- 
tives. This can be done, for instance, by using Wald’s definition of a most strin- 
gent test [2] or his concept of minimizing the maximum risk.' If, on the other 
hand, the class of alternatives is sufficiently restricted, the results of the present 
paper indicate that for small samples there may exist a test which is appreciably 
better than the standard test. 

Frequently the probability of an error of the first kind is an analytic function 
of a nuisance parameter for every choice of critical region. Hence, if it is known 
that some nuisance parameter @ lies, say in a certain finite interval J, then any 
test which is similar for 6 in J will be similar for all 6. Consequently, the knowl- 
edge concerning @ cannot be used to find a more powerful test. On the other 
hand, as is indicated at the end of section 5, restrictions of the nuisance parame- 
ters may, for small samples, lead to considerably more powerful tests if the con- 
dition of similarity is replaced by the weaker condition (3). 

There is one class of problems to which it may be desirable to apply the method 
of the present paper regardless of sample size; namely, if no similar region exists. 
Suppose, for instance, that X,,--- , X, are known to be normally and inde- 
pendently distributed, X; having unknown mean and variance &; and o; for 7 = 
1,---, mn. For testing the hypothesis 


Hy :o; = 1, (¢ = 1,---,n) 
no similar region exists, while it is easy to see that against any simple alternative 
Ay: 6; = oa < 1, & = ga, 


there exists a test which satisfies condition (3) and which has good power against 
H, provided the ox; are sufficiently small. 

The present first part of this paper is restricted to hypotheses concerning 
normal distributions. It is intended to extend the considerations to exponential 





1 In an unpublished paper, it is shown by G. Hunt and C. Stein that the traditional test 
is most stringent in several cases, including the (univariate) linear hypothesis and the 
hypothesis specifying the ratio of the variances of two normal distributions. These results 
can be extended to analogous problems for distributions other than the normal, and similar 
results can be proved regarding minimization of the maximum risk if the weight function 
has a certain type of symmetry. 
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and rectangular distributions, to consider non-parametric problems and _pos- 
sibly also more complicated problems connected with normal distributions, in 
later parts of the paper. 


2. Sufficient conditions for a most powerful test. The method which will be 
used in this paper to obtain most powerful tests is an adaptation of the funda- 
mental lemma of Neyman and Pearson [1]. At the same time it is essentially 
a special case of much more general results of Wald [3, 4], although theexact 
conditions of Wald’s investigation are not satisfied in most of our problems. 

Let h and g be two functions defined over R, , let k be a constant and let w 
be a region in R, such that 


g(x) > k h(x) in w; 
(4) 
g(x) < k h(x) in R, — w. 


Then if w’ is such that 
(5) h(x) dx < / h(x) dz, 


it follows as in the fundamental lemma where in (5) equality is assumed instead 
of inequality, that 


(6) [ g(x) dx < [ g(x) dx. 


Throughout the present paper we shall be concerned with the special case in 
which is an s-parameter family. We may denote the members of ‘fF by f, and 
we shall obtain all members of as 6 ranges over a set w in an s-dimensional Eu- 
clidean space. In the theorem which we shall now state, we shall be concerned 
with point functions \ defined over w. We shall assume that \ = cy where c 
is a positive constant and » a cumulative distribution function.” Also we sup- 
pose that f(x) is a measurable function of x and @ jointly. However, the theo- 
rem is also valid if w is an abstract space and ) a (finite) non-negative additive 
set function (measure) over w. Such more general interpretation may be re- 
quired when applying the theory to non-parametric problems. 

THEOREM 1. Let Ho be the hypothesis that the random variable X is distributed 
according to a density function fy with 0 in w, and let H, denote the alternative that X 
is distributed according to a density g. Let \ be a function defined over w and such 
that 


(7) d= ch, 


2 The introduction of the distribution » is simply a mathematical device and does not 
imply that @ is a random variable (see Wald [16] p. 282). 
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where c is a positive constant and uw a cumulative distribution function. Let k be a 
constant and let w be a region in R, such that 


g(x) >k f fulz) dd(@) in w; 
(8) : 
g(x) <k / deo) mam RB. — wv. 


Suppose that w is of level of significance ¢ for testing Ho against Hy, , that is that 
(9) / fe(x) dx < € forall @inaw, 

and suppose that the subset of w for which 

(10) f felx) dx <e 


has \-measure zero. Then w is most powerful for testing Hy against H, at level of 
significance €e. 


Proor. Without loss of generality we shall assume c = 1. Let w’ be any 
test of level of significance «. Then 


(11) / fea) dx <e forall @inw, 
and because of (7) 

> 
(12) Lil fol) dx) an(0) < a dn(0) = 


Since \ is of bounded variation we may interchange the order of integration in 
(12) and obtain 


(13) [ h(x) dx < «, 
where 
(14) h(x) = | fla) arco). 


From (9) and the condition surrounding (10) it follows that 


(15) [ { [ flv) ax} dn(0) = ¢, 


and therefore that 


(16) [ h(x) dx = «. 


Thus w and w’ satisfy conditions (4) and (5), and hence also (6) which completes 
the proof. 
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It is useful to notice that, the assumptions of theorem 1 will be satisfied pro- 


vided 
/ folx) dx 


attains its maximum « at all points of increase of \, and therefore in particular 
whenever w is a similar region of size e. 

We shall in many problems exhibit a function \ which satisfies the conditions 
of theorem 1 without giving the reasons which led us to this function. However 
the following comments concerning the tentative process that we used, may be 
helpful. One may first examine the known most powerful similar region. If 
there exists a cumulative distribution function \ such that (8) is the most power- 
ful similar region, the problem is solved. If the most powerful similar region 
cannot even be approximated by (8) with a sequence of }’s, it is reasonable to 
conclude that the most powerful test is not similar. Because the probability 
(under the null hypothesis) of any test is in all the problems considered here an 
analytic function of the parameter, this implies that the probability (under the 
null hypothesis) of the most powerful test attains its maximum at an at most 
denumerable (in some cases finite) set of points. In all the cases of this kind 
which we considered in the present part I, it was then possible to prove the 
existence of a function A with a single point of increase, which satisfied the condi- 
tions of theorem 1. 

A theorem analogous to theorem 1 holds for most powerful similar regions. 
Let Ho and H;, be as before and let \ be a function of bounded variation not 
necessarily non-decreasing. Let w be a region in R, such that 


g(x) >k / fox) ddX(0) in w; 
(17) " 
giz) <k / ia) =a R-« 


Let w be a similar region of level of significance « for testing Hy against H, , that 
is, let 


(18) / fe(x) dx =e forall @inw; 


then w is a most powerful similar region for testing Ho against 1; . 

For all the problems considered in this paper we shall prove the existence of 
functions \ satisfying the conditions of theorem 1, but we have not investigated 
the corresponding existence problem in general. On the other hand one verifies 
easily that for many of the cases treated here in which the most powerful test is 
not similar, the method for obtaining most powerful similar regions does not 
apply. However, for all the problems considered in the present paper the most 
powerful similar tests can be obtained easily by other methods [1, 5, 6, 7, 8). 
For most of the problems the corresponding derivations have been carried out 
in the literature. 
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Although we restrict ourselves in the present paper to the problem of maximiz- 
ing the power at a single alternative, theorem 1 clearly also applies to the more 
general problem of maximizing the average power over surfaces in a space of 
alternatives. Such problems have been considered from the point of view of 
similar regions by Wald, Hsu and others [9, 10, 11]. 


3. Testing the values of one or several variances. Let X,,--- , X,beasample 
from a normal population with mean ~ and variance o, both unknown. We 
want to test the hypothesis Hy that « = oo against the simple alternative that 
@ = o0,,& = &. We shall show that the most powerful test for Ho against H, 


1s 
(19) X(2; —&)? <k when o <a; 
(20) Xa; — #)” >ec when o> a0, 


where / and c are determined by the level of significance. Thus the best similar 
region is most powerful if the variance under the alternative is greater than that 
under the null hypothesis, while the most powerful tests against the other alter- 
natives are not similar. That the region S(«; — Zz)” > ¢c (< c’) is most powerful 
of all similar regions against 0; > oo (a1 < oo) was shown by Neyman and Pear- 
son {1}. 

We consider first the case o; < oo , and apply theorem 1 with A a stepfunction 
having a single jump at &, that is, 


0 if €<&; 
(21) ME) = 
1 @ 26 
The region w given by (8) thus becomes 
1 > \2 
exp| —93 Z(x; — &) 
(22) = > k’, 
c ~5.2 Aas — € ; 
exp | oa (a 1) | 
which is equivalent to 
(23) (2; — &) <k, 


since o; < oy. The size of the region (23), that is, its probability under the null 
hypothesis is a function of € and clearly attains its maximum when é = &. Thus 
all conditions of theorem 1 are satisfied provided we choose k so that the maxi- 
mum size of (23) equals e. 
Before considering the case o; > o we state for later reference the following: 
Lemma 1. If 0; > oo there exists an absolutely continuous non-decreasing func- 
tion d of bounded variation such that 


as 1 ; 1 ‘ 
(24) [ exp e (t — °| dd(é) = C exp es (t— a. 
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This follows immediately from the weli known representation of exp (- *) 


as a Laplace transform by applying a translation, and is easily verified directly 
by substituting 


] , 
(25) \’(é) = exp ge an (§ — a | ° 


Now let o; > onandn > 1. The region w given by (8) can be expressed in the 


form 
1 és n 
exp :- Ya; —- | exp a ( — a | 
(26) wisnatiiniaede — + cnanggllieemntigg ns: Ss ie. 


1 n 
exp E- Ya; — a | [ exp ea (f — 2 | dn(é) 


By lemma | there exists an absolutely continuous function \ for which the second 
factor is constant. For this \ (26) is equivalent to 


(27) Xa; — #) >, 





and since this is a similar region, the conditions of theorem | are satisfied pro- 
vided c is chosen so as to give the correct level of significance. 

We next consider the problem in which the random variables X ; (i = 1, --- ,n) 
are independently normally distributed with unknown means £; and unknown 
variances ¢;. We wish to test the hypothesis Hy: oi = ow fori = 1,---,n 
against the alternative Hi: 0; = ou, = £1. Feller [12] showed that there 
exist no similar regions for this problem. However, as we shall show now, when 
the critical regions are not required to be similar, non-trivial tests against H; 
do exist provided oi < oi for at least one value of 2. 

Let us assume without loss of generality that oi < oi for? = 1,---,m; 
o1 > o fori = m+ 1,---,m where n — m may be zero but where for the 


moment we shall assume m > 0. With A(E&,--- , Ex) = TI \.(é;), the region 
i=l 
(8) becomes 


1 
m exp | a8 (2x; = ia | 


~— f” ; ae z r 2 
[. exp | eo? Jae 


1 2 
i nn 


. jad : = 1 eg cE 7 : Sue 
- [. exp e (s; — zi | dd (é;) 


For \;(¢ = 1,--- , m) we take step functions with a single jump at £ , while 
for the remaining \’s we choose the absolutely continuous functions which make 


(28) 
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the second factor constant and whose existence is guaranteed by lemmal. The 
region (28) thus reduces to 


(29) DX (x - «) (t; — fa)’ Se. 

i=l \O71 T%i0 
Since the probability of the region (29) is independent of Em41,--- , &, and with 
varying £ ,--- , £m takes on its maximum when é; = £, it follows from theorem 
1 that this region is most powérful for testing Hy against HM, . 

We still have to consider the case m = 0, that is, the case in which oa > oi 
forallz. To treat this problem we adjoin to the variables X,, --- , X,a random 
variable Y uniformly distributed between 0 and 1, that is, essentially a table of 
random numbers. In the space of n + 1 random variables we determine a region 


w according to (8), letting \(&,--- , &:) = I] \.(é;) and choosing the 2’s so 
i=1 


as to make the left hand side of (8) equal to the right hand side. This is possible 

by lemma 1 and with this choice of the \’s the inequalities (9) become 
k>kinw; 

(30) 


and hence they impose no restrictions on w. Thus any similar region of the cor- 
rect size will satisfy the conditions of theorem 1. It follows that the region 


(31) widO<y<e 

being a similar region of size e, is most powerful. This result means that we do 
not use the observations 1, --- , 2, at all but consult a table of random num- 
bers. 


The situation just described occurs in other problems to which the same 
method of proof can be applied. It is therefore convenient for later reference to 
formulate the following 

THEOREM 2. Let Ho be the hypothesis that the random variable X is distributed 
according to a probability density function f » with 0 in w, and let H, denote the alter- 
native that X is distributed according to the density function g. Let Y bea random 
variable known to be uniformly distributed over the interval |0, 1]. If there exists a 
real valued function satisfying (7) for which 


(32) g(x) = k [ Sole) ar(o), 


then the critical region 0 < y < €%s most powerful for testing Ho against H, at level 
of significance e. 


4. Testing equality of variances and the value of the circular serial correlation 
coefficient. Foreachi = 1,--- , mlet X,;(j7 = 1,--- , n:) be a sample from a 
normal distribution with E(X,;) = & and E(X;; — &)° = o;. We are con- 


he 


rid 
of 
yn 


$0 


yn 
a 
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cerned with the hypothesis Hy that o1.= o2 = -:: = o», Where first we shall 
assume the £’s to be known, so that without loss of generality we may assume 
them equal to 0. The alternative hypothesis specifies ¢;5 = o4,7 = 1--- m. 
Let o denote the unknown common variance und2r Hy and let \(c) be a step 
function with a single jump at a point o to be determined later. With 


m 


t = II 


i=l \Oi 
exp E x <a 
(33) eo _ _& 9 71 > 1, 
ap | - > 2] 
205 inj 


A , the test (8) takes on the form 


— KS 


or equivalently 


(34) =~ 2. 





Since the function on the left hand side is homogeneous of degree 0 in the z’s, 
this is a similar region and the conditions of theorem 1 are therefore satisfied 
provided the region has the correct size. This can be achieved for any level 
of significance « by proper choice of oo. 

As stated earlier, the conditions of theorem 1 imply that the size of the critical 
region is equal to « at all points of increase of A. As a consequence, if the size 
equals ¢ at only a finite number of points of w, \ must be a step function. Also 
if each point of a certain interval is a point of increase of X, the critical region 
must be similar over that interval (and, if the functions involved are analytic, 
the region must be similar over w). However, the last problem shows that the 
converse of neither of these two statements is correct. For the region (34) 
is a similar region although the corresponding \ has only a single point of increase. 

Next we consider the hypothesis of equality of variances without assuming the 
means to be known. For the case m = 2 the most powerful similar region was 
obtained by Neyman and Pearson [1]. We assume first that n; > 1 for all 7, 


and we take X(o, &,°-*, Em) = ro(@) LT, with \o(o) as before a step func- 
tion with a single jump at a point oo tebe determined later. Suppose now that 
o > onfori = 1,---,8; o <onfort =st+ 1,°°-,m,on < on <--- 
where 0 < s < mand s depends on o). Then define 
(35) aa 

l1 if & > &a 


for? = 1,--- , sand uselemma 1 forz = s+ 1, ---+ ,m. 
For proper choice of k the critical region will then be determined by the in- 
equality 
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(36) > € ~ =) > ao, +27 «> ( oe yd (xj; — Ea)” > 0. 
imst+1 \Oo Onis j=) i=l \Gr1 O07 j=1 
The probability of this region computed under H, , is independent of &41, --+ , 
é, and for any o attains its maximum when £; = & (¢ = 1,--:, 98). Since the 
probability of the region is independent of o when &; = & for 7 = 1,---, 9s, 
the conditions of theorem | are again established. That for &; = £&, the size of 
(36) goes continuously from 0 to 1 with decreasing oo is easily checked since at 
the only doubtful points oy = o (where the value of s changes), the correspond- 
1 


ing coefficient ~; — ~» passes through 0. 
9 861 


We still have to consider the case that some of the n; are equal to 1. If n; = 1 
for some 2 < s there is no change whatever, while if n; = 1 for somez > s, 
the corresponding term in (36) vanishes. It follows easily that if n; > 1 for at 
least one value of 7 > 1 the solution (36) is valid. On the other hand, if n; = 1 
for alli > 1, we can apply theorem 2 by taking oo = oi , \x(t:) as a step function 
with a single jump at & and the remaining \,(é;) according to lemma 1. It thus 
follows that for this problem no non-trivial test exists. 

The following problem can be reduced to the hypothesis of equality of vari- 
ances with means assumed known: Under the null hypothesis X,, --- , X, have 


a joint multivariate normal distribution with density C exp l-, 5 & Ata; 
5 


where the a’s are known and where o is an unknown scale factor. Under H, 
the X’s have a joint multivariate normal distribution with density C’ exp 


= ; —— ‘ 
| - gr bistix; . A number of hypotheses specifying the value of one or several 
correlation coefficients have this form. The most powerful test of Ho against 
Hf, is given by 

Lb; vir; 
(36) a 

AAU 5X; 
as is easily shown by applying a non-singular linear transformation which re- 
duces Yb; ;v;x; to diagonal form and Ya;;x;v7; to a sum of squares, or by applying 
directly the method of proof of the earlier problem. 

A corresponding reduction when the X’s have a common but unknown mean is 
usually impossible. One problem of this kind for which the solution is simple is 
the hypothesis specifying the value of a serial correlation coefficient in a circular 
population. The most powerful similar region for testing this hypothesis was 
obtained in [7]. Consider the probability density function 


: 2 \2 
¢ exp E {du (2x; -— op 6(2; a z) | 
(37) L 2 a ) 

(tas. = 11), |6| <1, 





al 


st 
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and let Ho specify 6 = 69 while H, assigns to the parameters the values ay , £1; 6. 
Then the most powerful test of Hp against H, is 


Ln 


Z(x; — £)(tin. — 2) ck e+ ee. 
=(x; — #)? ~ ee 
(38) 


< 0. 


Yai — &)(tin — ates a 

. é1) i 1) <k’ if & 
Z(z; = &,)? 

We shall omit the proof of this result, since the method is the same as in the other 

problems considered in this section. . 


5. Student’s hypothesis and some generalizations. As the principal result of 
the present section we shall prove that for testing Student’s hypothesis against a 
simple alternative the most powerful test is a non-similar region of the form 


(39) =(X; — 2) < ik, 


if the level of significance ¢ is less than or equal to 3. Here » and k depend on 
e and on the alternative, and they will not be determined explicitly. It will be 
shown also that if ¢ is greater than or equal to }, Student’s test is most powerful. 
These results will be extended rather easily to the general univariate linear 
hypothesis. The corresponding investigation for similar regions was carried 
through for Student’s hypothesis by Neyman and Pearson [1] while the extension 
to a general linear hypothesis is contained in a paper by Hsu [13]. 
The proof of the main result mentioned above is rather lengthy. We shall 
begin by proving two lemmas. . 
Lemma 2. Let Y:,--- , Yn be n independent random variables, normally dis- 
tributed with 0 mean and unit variance, and let 
n \ 
P(a,k) = P ‘x (Y;-a)l <(an- k)a’}; 
(40) i ) 
g(k) = sup P(a,k) for O<k <n, O<a. 
a 


Then for each k there exists a(it) such that 


(41) P(a(k), k) = ok). 

Proor. If Z; = Y;/a, (¢ = 1,---,m) the Z’s are independently normally 
distributed with zero mean and variance 1/a” and (40) may be written as 
(42) P(a,k) = P{=(Z; — 1)° < n — k}. 


Hence it is seen that for any k, P(a, k) tends to zero as a tends to either zero or 
infinity. This proves the lemma since for any i, P(a, k) is a continuous function 
of a. 

Lemma 3. Given any €,0 < ¢€ < 3 there exists k(€) between zero and n such that 
g(k(e)) = e. 
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Proor. The proof will be given in a number of steps. 

(i) g(k) ~> Sask > 0. 

Clearly P(a, i) never exceeds 3. The result will therefore follow if we exhibit 
a sequence a, such that P(a,,k) ~}ask—>0. Leta, = 1/VWk. Then 


(43) Plax, k) = P{VAk TY; — 23Y; + Vk < 0}. 
The right hand side is a continuous function of / and therefore tends to 
(44) P{ZY; > 0} = 3, 


as k tends to zero. 

(ii) g(k) ~Oask —n. 

Consider P(a, i) as in (42). Written as an integral of the probability density 
of the Z’s, the region of integration is independent of a and its volume tends to 
0 as k tends to n. On the other hand the probability density depends on a 
but is uniformly bounded over the region of integration if : > 0, and hence the 
result follows. 

(iii) If 0 < ko , P(a, i) tends to zero uniformly for k in the interval ky < k <n 
as a tends to zero or infinity. 

This follows from the fact that 0 < P(a,k) < P(a, ko) since P(a, ko) tends to 
Q as a tends to zero or infinity. 

(iv) Given kp and /, there exist numbers ad and a; with 0 < a@ <a, < © 
such that 0 < yb < k < ky < n implies a < a(k) <a. 

If this were not true there would exist a sequence k” with ky < k‘” < ki and 
a(k) tending to infinity or zero. Then ¢(a(k’)) would tend to zero by (iii). 
On the other hand consider P(1, k) for ko < k < ky. This is a continuous non- 
vanishing function of k and hence attains its lower bound m for some k in ky < 
i: < ky. Therefore m is positive and we have a contradiction. 

(v) Given any ky, ky with O < hyo < ki < n, g(k) is continuous on the inter- 
val [ho ’ ky). 

To see this, select a and a; in accordance with (iv). Then P(a, k) isuniformly 
continuous in the rectangle aq <a<a,kb <k <i. Given 7 > 0 let 6 be 
such that | k’ — k’’| < 6 implies | P(a, k’) — P(a, k’’)| < 7. Then ¢g(k’) 
P(a(k”’), k’) > Pak”), kb’) — 9» = o(k’’) — 2, and by symmetry ¢(k’’) 
g{k’) — n, which establishes the continuity of ¢. 

The proof of the lemma is now immediate. For let 0 < «€ < 3. It follows 
from (i) and (ii) that there exist ko and /, such that 

elke) S &/2, o(s) >e +30 — ©), 
and hence by (v) there exists k(e) for which g(k(e)) = «. 

Let us now consider Student’s hypothesis. The random variables X,,--- ,X, 
are a sample from a normal distribution which under Hp has mean 0 and un- 
known variance o°, while under H,; the mean is £, and the variance oj. Without 
loss of generality we shall assume & > 0. Applying theorem 1 with \ a step- 
function having a single jump at a point oo > o; to be determined later, we ob- 
tain the critical region in the form 


2 
> 
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wee 


] ] ° 
C71 oo o 


Let Y; = X;/o so that under Hp the Y’s are distributed with zero mean and unit 
variance. Then (45) becomes 


| 


to 


(46) ZY? —2 ae aa BY, < 5, 
which may be written as 
(47) X“(¥; — a)’ < (n—k)a’, 
where 

~——@ o1\ 
= ~ ae are’ re (: - “). 


As o varies from 0 to ~, a goes from » to0. Let P(a, k), (k) and a(k) be 
defined as in lemma 2. Given the level of significance « (0 < € < 4), let k* 
and a* be determined according to lemma 2 and 3 so that 





(49) o(k*) = e and P(a*, k*) = o(k*). 
We now select op > o; and c so that 
& c o} 
50) a* = ——__—-— and k* = . ( - *) ; 
, (1 — a1/o0)o0 Ei oO 


We have to show that for this choice of oo and c the size of the critical region at- 
tains its maximum when o = o and that this maximum size is e. Substituting 
from (50) we express the region (47) in the form 


(51) S (v. a . ar) < (n — k*) 2 g**, 
oC o~ 


Thus the probability of the region is 


(52) P (: a*, i*) 
o 


e as ° ° ° 00 ° 
As o varies, (52) attains its maximum when —a* = a(i*) = a*, that is, when 
o 


o = o9 and the maximum value of (52) is g(k*) = e. 

This derivation is valid even when n = 1, i.e., when the hypothesis § = 0 is 
to be tested by observing only a single random variable X, known to be nor- 
mally distributed but whose mean é and variance are unknown. For this prob- 
lem no similar region exists. However, critical regions of the form 0 < & — a < 
x < & + 6 will give any level of significance < 4 for proper choice of a and b, 
while the power of such regions will tend to 1 as o; tends to0. Therefore, the 
power of the most powerful test will be close to 1 if o; is sufficiently small. 
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Having completed the discussion of the case « < } let us next suppose that 
e > 4. We shall need the following 

Lemma 4. Let c and a be positive constants. Then there exists a function f 
such that f(a) = 0 when a < a and such that for all w > 0 


oo — 
(53) [ e *” f(a) da = ke" “VY. 
0 
This follows from the well known representation of eV" asa Laplace trans- 
form by applying a translation. (53) can be checked directly by substituting 
’ —(c? /4(a—ay)) 
ai eu ce . 
(54) fla) = — {or a>aq. 
(a — a)3? 
Applying theorem 1 to Student’s hypothesis, where again we shall assume & 
to be positive, for proper choice of / we obtain from (9) 


L a 
exp] — 52 ZXi t+ TX; 
(55) = — > 1. 


@ ] : a 
exp| — .-. DX; : dX(a) 
2 2 n 
0 t. o o 


It follows from lemma + that for any positive c there exists a non-decreasing 
function \ of bounded variation with \(c) constant for ¢ > o1 , such that 


/ l >? 1 >? - 
(56) [ exp | - - =x] : dio) = exp] — >a TXi-c Vs] ; 
0 2c” o” 20}; 


For this choice of A, (55) reduces to 


G1 


-_— &1 7 F ° 
(57) exp | = | => exp [—e V/ 2°], 


and hence to 


Vy 
at ; 


(58) ao ¢. 


V 22? 
This is a similar region and therefore most powerful for testing Student’s hy- 
pothesis against H,. By adjusting c, the size of the region can be made equal 


toany € > 3. 


The argument for « > 4 must be modified slightly in the case n = 1, that is, 
when we want to test Student’s hypothesis on the basis of a single observation. 
Let us adjoin to the variable XY a random variable Y known to be uniformly 
distributed over the interval [0, 1]. Using the same \ and k as before, (58) 
becomes 


- x “ 
(59) > ¢’ 








hat 


m f 


Ms- 
ing 


ng 





COMPOSITE HYPOTHESES 509 


For c’ = —1 the critical region includes all points (x, y) for which z is positive 
while (59) places no restriction on which of the remaining points to include in 
the critical region. The similar region 

(60) z 2 G; s<@ 0O<y < 2e — 3) 


therefore satisfies all conditions of theorem 1 and hence is most poweriul 

In extending these results to the genera! linear hypothesis, we shall assume the 
hypothesis reduced to canonical form [14, 15]. We shall therefore assume that 
Xi, °°: , X» are normally distributed with common variance which is unknown 
under Ho and has the value o;° under H,. Furthermore, under Hy , E(X;) = 0 
fori = 1,---,s,s + 1,--- , m; E(X;) unknown for 7 = m + 1, --- , n while 
under H,; E(X;) = Ofori = s + 1, --- ,m; E(X;) = &, for the remaining values 
of 2. 

For ¢ < 3 we shall consider critical regions of the form 


] & 
er aa Bi - et Sat Den] 
2t 


(61) fot i=s+ t=m+1- a ‘ 


exp { - 5.3 aloe +> 24+ z | = 


i=s+1 





which are obtained from (8) by substituting for \ a step-function with a single 
jump at the parameter point (0; , &m41,1, °° , &n,1). Making an orthonormal 


a Fax; 


transformation from z,, -:-, 2; to y1, °°: , y} such that y, = i= and 


Verh 


letting y, = 2; for? =s+1,---,m;y; = 24; — a fori = m+1,- 


(61) reduces to 
exP{— 5 oa | et yi — on i/ > é} 
o}- ee 


i=1 ) 


(62) 





For oo > o; we can rewrite (62) as 


1 2 _ 1 = 9 
(63) S g< z—alet = vz - - dvi 


i=m+1 





and we see that under H) for any o the size of this region considered as a function 
of the unknown means of Ymsi,---, Yn takes on its maximum when these 


? 
means are zero, i.e. when £; = &: for? = m+ 1,---,m. For these maximizing 
values of the means the existence of a suitable oo and c follows from the corre- 
sponding result in connection with Student’s hypothesis. 
Thus the most powerful test for testing Hp against H, at level of significance 
e = 3 has the form 


(64) %[e- eal + > a4 © tn ~ 4 Se 


i=) 1 — 7 i=s+l i=m+1 
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It is interesting that the variables X;(i = m + 1, --- , n) which may be dis- 
carded when considerations are restricted to similar regions [18], do contribute 
to the power when similarity is not required. The same phenomenon also 
occurs in certain problems considered earlier in this paper. 

For the case e > 3, let us take 


(65) Ao, Em+1 » aT ae a) _ A(c) Il ACE; | a). 


i=m+1 


We shall select \(c) such that \(c) is constant when ¢ > o;. Hence it is enough 
to define \,(¢; | ¢) foro <o,. For any o < o; there exists by lemma 1 a func- 
tion \,(é; | 7) such that 


sa . 1 ' ( 1 ’ 

( . ae 2. - 2 (et. = i ex —_ a 2 
(66) [ exp | 52 ti — §) jaw |o) =i -, 202 (x; — €a) \. 
For this choice of the \; , (9) becomes 

( 1 ~ 9 _ o 
exp, ~ =3| Le: - ta + Df 
\ 201 | i=l ; i=s+1 > } , 
© . f 1 m : \ eo 
I exp ‘ie on2 ae tif OD (c) 


i=1 J 


(67) 


Next we chose \(c) according to lemma 4 such that 


/ @ 1 mm " | me ” nv os 
(68) [ exp | - ~ ot | = d\(o) = exp | - 2 7. xr;i—ec / 2; 
0 20° i=] eo” 201 ‘=1 i=! ) 


thus, by proper choice of k’, reducing (67) to 


a fa Lj 


(69) - = > —¢. 
vie 


The probability of this region under Ho is independent of Em41,--- , & and o, 
and hence (69) is most powerful for testing Ho against H; . 

Let us return once more to the problem of testing Student’s hypothesis against 
a simple alternative = & , o = 1 and let us assume as known that o < 1. No 
use can be made of this knowledge if consideration is restricted to similar regions. 
For the probability of first kind error is an analytic function of ¢, and conse- 
quently, if a test is similar with respect to all values of ¢ which are < 1, it is simi- 
lar with respect to all values of ¢. Let us now consider this problem without the 
restriction of similarity. If « > 3, the knowledge concerning o does not enable 
us to find a test which is more powerful than that given by (58), since the func- 
tion \(o) on which (58) was based had all its points of increase for « < 1. 

On the other hand we may expect improvement for e < 4 since the most 
powerful test in this case was based on a function A with a single point of increase 
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oo > 1 which is no longer admitted as a possible value of ¢. If, instead, we take 
for \ the step function with a single jump at o = 1 we obtain the critical region 


(70) exp [ — 4 D(a: — &)'] 
ep[—32izil  ” 





which is equivalent to 
(71) E> c. 


Here c > O since e < 3, and therefore, when & = 0 the probability of (71) is an 
increasing function of o and hence takes on its maximum at o = 1. It follows 
from theorem 1 that (71) is most powerful under the conditions stated. 

In the opposite problem in which it is known that o > 1, the situation is 
reversed. For e < } no improvement over (45) is possible while for « > 4 we 
can use for \ the step function with a single step at o = 1 thus obtaining the 
critical region (70) but this time with c < 0. When ~ = 0 the probability of 
this region is a decreasing function of o and it follows that (70) is most powerful 
in this case. 

Similar remarks apply to other problems. We mention as one further ex- 
ample a modification of the Behrens-Fisher problem. Let Xi, ---, X, and 
Yi1,--:, Ym be independently normally distributed, the X’s with mean & and 
variance o°, the Y’s with mean 7 «nd variance 7’, all four parameters being un- 
known. We wish to test, ai level of significance « < }, the hypothesis & = 7 
against the simple alternative § = &, 7 = m,o = 1, 7 = 1, where & + m and 
we assume it known that ¢o < 1,7 < 1. Basing the test on a step function A 
ni + mn 

a a 


with a single jump ato = 1,7 = 1, = we obtain for w the region 


exp [— 3 D(a — &)* - 3 - mw) 


SX (x = ey 7 


2 n+m 


ni: + mn *] 2 &, 
x (y. = n+ - )] 


(74) ¥—-Z>c (ec > 0), 





tole 


cof - 


which is equivalent to 


if we assume, as we may without loss of generality, that m > &. When 7 = 


9 
=, <> o ° . . e o~ T 
& , Y — X is normally distributed with zero mean and variance — + —. There- 
nm 
2 


fore the probability of (74) is an increasing function of - + A and hence attains 
its maximum when o = 7 = 1. It follows from theorem 1 that the region 


(74) is most powerful for the problem under consideration. 


6. Admissibility. The general problem to be considered in this paper has 
been formulated in section 1: To obtain a region w 
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(75) maximizing / g(x) dx 

subject to the restriction 

(76) [ fix) dx Se for all 0 € w. 

Since for any particular such problem there may exist several essentially different 


regions satisfying these conditions, it may happen thai there exists a region w’ 
such that 


(77) [ o@) dx = / g(x) dx, 
and 
(78) [ fe) dx < / felx) dx for all 6 € w, 


with inequality holding for some @. Clearly w’ is preferable to w. In this case, 
following the definition of Wald [4], we say that w is not admissible. We shall 
rule out this possibility for a large class of problems by proving 

THEOREM 3. If w satisfies the conditions of theorem 1, and if the set of points 
x for which equality holds in (8) has measure zero, then any region satisfying (75) 
and (76) differs from w only on a set of measure zero. 

Proor. Without loss of generality we shall assume \ of theorem 1 to be a 
distribution function. Then 


lied = / fol) ar(0) 


is a completely specified probability density function, and w is the unique*— 
up to a set of measure zero—most powerful test for testing the simple hypothesis 
Ho:h against the simple alternative H,:g. Suppose now that w’ satisfies (75) 
and (76). Then 


(79) / h(x) dx < «, 


and w’ is most powerful for testing Ho against H,. It follows that w’ differs 
from w at most by a null set. 

Earlier we enlarged the problem of testing by adjoining to the original random 
variable X a random variable with a known distribution. This is equivalent 
to the following modification of the original problem. Instead of defining a test 
to be a critical region (of rejection) in the space of x, we define it to be a critical 





3 One sees this easily from Neyman and Pearson’s proof of the fundamental lemma [1], 
by using the assumption that the set of points for which equality holds in (8), has 
measure zero. 


1), 
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function ¢g (0 < g(x) < 1) which with every point x associates a probability of 
rejection g(x). If x is observed, the hypothesis is rejected with probability g(x) 
according to a table of random numbers. In the case where random numbers 
are not employed, ¢ merely becomes the characteristic function of the set w. 

We shall now state a theorem which will prove admissibility for all but one of 
those problems treated in sections two to five, to which theorem 3 does not apply. 

THEOREM 4. Suppose w = {6} is a subset of an s-dimensional Euclidean space, 
and that for any measurable function g and for any set S which has positive measure 
and is contained in w 


(80) [ epee) dx =c for@0¢eS 
implies 
(81) [ corpo) dx =c for 0 € w. 


(Here and in all that follows whenever a region of integration is not indicated, the 
integral extends over the whole x space). Suppose further that ¢ is a critical function 
satisfying the conditions of theorem 1 and that the set So of points of increase of 
has positive measure. Then ¢ is admissible. 

Proor. If ¢ were not admissible there would exist ¢; with 


(82) few g(x) dx = [e@g() dx; 
(83) [eerpee) dx < [e@ypole) dx for all 0 € w; 
(84) [e@pe) dx < [ore dx for some 6 € w. 


The set T of points 6 for which (84) holds, differs from w at most by a null set, 
For 


(85) [tac — o(x)\fela) dx = 0 for @ew — T, 
and if w — T had positive measure, (85) would hold for all 6 € w. 


Let h and Hy be defined as in the proof of theorem 3. Since S has positive 
measure, it follows that 


(86) e= ou dx > [eon dz = 4, say. 
Let go(x) = min 1 g(a) +e oI. Then 


(87) [eon dx <.€ 
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and 
(88) | go(x)g(x) dx > / gi(x)g(x) dx. 


But g; is most powerful for testing Ho against H, and we have a contradiction. 

By applying theorems 3 and 4 one can easily show for all but one of the prob- 
lems treated in sections three to five that the tests obtained there are admissible. 
The one exception occurs when testing equality of variances. Simplifying the 
notation, since we are now concerned with a special case, we shall assume that 
X,(@ = 1,---,m), Y1,---, Y, are independently and normally distributed, 
the X’s with mean & and variance oo, Y; with mean é; and variance a7, all para- 
meters being unknown. We wish to test the hypothesis of equality of variances 
against the simple alternative 


A, :& = ga, Oi = On @@ = 0O,--- 
with 
01 < ou <°** On. 


We shall first consider the case n = 1, and prove admissibility of the critical 
function 


(89) ¢(2, Bis" 5 Yr) i 


by using a different distribution function for the parameters from the one used 
earlier. With some specialization of the distribution function, (8) becomes 
for our problem 


2 lie l 
— is (x a? fo)” v7 + (yi <= £,)° 
a L 2om 27 oi f 


[il Joo] ~ pie -araren] 


é Ir exp | - 53 OF - e| arse} d u(o) 


i=1 





For any « < oo: we select the df” (é;) according to lemma 1. If we then take for 
u the uniform distribution over (oo, — 1, 01) the left hand side of (90) will reduce 
to k. Admissibility of the critical function (89) then follows from theorem 4. 

That a constant critical function is not admissible in the case n > | is easily 
seen if one compares it for instance with the critical region 

| Z— ko | 

91 | ~S |= | Ke. 
” Vi — |=“ 
We shall not obtain a complete family of admissible tests (cf. [4]) for the case 
n > 1 but we shall show that this problem is equivalent to the following one: To 
find a complete class of unbiased admissible tests for the hypothesis specifying 
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the mean and variance of a normal distribution on the basis of a sample from 
this distribution, the class of alternatives being the totality of univariate normal 
distributions. 

Let n > 1 and let ¢ be any most powerful critical function for testing the 
hypothesis of equality of variances against H,. If y corresponds to the level of 
significance ¢ and if 8, denotes the power of y, we have 


(92) B,(o,0,°:+,0,%,8&,---,&) Se 


for all admissible values of the arguments. It also follows from section 4 that 
(93) Belo, on ee ,On, én, én, ie » En) — i 


Consider for a moment the hypothesis Ho:0; = on(t = 0,---, 7), & = én, &: 
unspecified forz = 1,---,r. It is easily seen that the maximum power for test- 
ing Ho against H, is «. Therefore any most powerful test for testing Ho against 
H, is also most powerful for testing Ho against H,, and in particular this holds 
forg. Furthermore, it follows easily from theorem 4 that for any most powerful 
test of Ho against H, the probability of an error of the first kind must be iden- 
tically equal to «. Therefore 


(94) By(on, * °° » G01, S01, &1,°°* » &) — eforallé:,---,&. 


But (94) is equivalent to the condition that ¢ is similar with respect to & , --- , 
é,, and it follows [12] that ¢ is a function of z,,--- , 2, only. The problem is 
therefore reduced to that of finding all admissible critical functions g(x; , --- , Zn) 
satisfying 


(95) Bo(o01 , £01) = €; Bo(oo , &) < efor all oo, &. 


That this problem in turn is equivalent to the one stated above is immediate when 
one considers the complementary critical functions 1 — ¢. 


REFERENCES 


[1] J. NEYMAN AND E.S. Prarson, ‘“‘On the problem of the most efficient tests of statistical 
hypotheses,’’ Roy. Soc. Phil. Trans., Ser. A, Vol. 231 (1933), p. 289. 

[2] A. Waxp, ‘“‘Test of statistical hypotheses concerning several parameters when the 
number of observations is large,’? Am. Math. Soc. Trans., Vol. 54 (1943), p. 426. 

[3] A. WALD, ‘“‘Statistical decision functions which minimize the maximum risk,’’ Annals 
of Math., Vol. 46 (1945), p. 265. 

[4] A. Waxp, ‘‘An essentially complete class of admissible decision-functions,’’ Annals of 
Math. Stat., Vol. 18 (1947), p. 549. 

[5] J. NeymMan, “On a statistical problem arising in routine analysis and in sampling in- 
spection of mass production,’’ Annals of Math. Stat., Vol. 12 (1941), p. 46. 

[6] H. Scuerré, “On the theory of testing composite hypotheses with one constraint,”’ 
Annals of Math. Stat., Vol. 13 (1942), p. 280. 

|7] kk. LEHMANN, “‘On optimum tests of composite hypotheses with one constraint ,’’ Annals 
of Math. Stat., Vol. 18 (1947), p. 473. 

[8] E. LEHMANN AND H. Scuerr®, “On the problem of similar regions,’’ Proc. Nat. Acad. 
Scz., Vol. 33 (1947), p. 382. 











516 E. L. LEHMANN AND C. STEIN 


{9} A. Waxp, ‘‘On the power function of the analysis of variance test,’’ Annals of Math. 

Stat., Vol. 13 (1942), p. 434. 

{10} P. L. Hsu, ‘‘On the power function of the E2-test and the T?-test,’’ Annals of Math, 
Stat., Vol. 16 (1945), p. 278. 

[11] Hl. K. Nanopr, ‘“‘On the average power of test criteria,” Sankhyd, Vol. 8 (1946), p. 67. 

[12] W. Fevuer, ‘‘Note on regions similar to the sample sapce,’’ Stat. Res. Memoirs, Vol. 2 
(1938), p. 117. 

(13) P. L. Hsu, ‘Analysis of variance from the power function standpoint,’’ Biometrika, 
Vol. 32 (1941), p. 62. 

[14] S. Ko.opzreczyk, ‘‘On an important class of statistical hypotheses,’ Biometrika, Vol. 
27 (1935), p. 161.: 

{15] P. C. Tane, ‘‘The power function of the analysis of variance tests with tables and 
illustrations of their use,’ Stat. Res. Memoirs, Vol. 2 (1938), p. 126. 

[16] A. Wavp, ‘‘Foundations of a general theory of sequential decision functions,’’ Eco- 
nometrica, Vol. 15 (1947) p. 279. 





uth. 


ith. 


SYMBOLIC MATRIX DERIVATIVES 


By Paut 8S. Dwyer ano M. 8S. MacpHatin 
University of Michigan and Queen’s University 


Summary. Let X be the matrix [v,,,,], ¢ a scalar, and let AN /dt, at/aX de- 
note the matrices [A2m,/dl|, [0t/Axmn) respectively. Let Y = [yp] be any 
matrix product involving X, X’ and independent matrices, for example Y = 
AXBX'C. Consider the matrix derivatives 0Y/0%mn, OYpq/8X. Our purpose 
is to devise a systematic method for calculating these derivatives. Thus if 
Y = AX, we find that 0Y/drm_. = Ad mn, OYpq/9X = A’Kpq, where Jmn is a 
matrix of the same dimensions as X, with all elements zero except for a unit in 
the m-th row and n-th column, and K,,, is similarly defined with respect to Y. 
We consider also the derivatives of sums, differences, powers, the inverse matrix 
and the function of a function, thus setting up a matrix analogue of elementary 
differential calculus. This is designed for application to statistics, and gives a 
concise and suggestive method for treating such topics as multiple regression 
and canonical correlation. 


1. Introduction. The derivative of a matrix with respect to a scalar 


: oY 9 _ | 8Ypq 
(1) — ay Weal = k= 


is well known and commonly used. The symbolic derivative obtained by apply- 
ing a matrix of differential operators to a scalar 


Oy _ 0 = oy 
(2) ax | me EA 


is not in such general use though some authors give special cases. For example, 
if A is a symmetric matrix and X a column matrix, so that y = X’AX is a quad- 
ratic form, Fraser, Duncan and Collar [1, p. 48] write 


0/dx; 


0/0X2 


(3) y = 2AX 


0/OX», 


to indicate concisely the result of differentiating y with respect to the elements 
vy of X. 

It is to be noted that the matrix in (1) has the same dimensions (numbers 
of rows and columns) as the matrix Y, while the matrix in (2) has the dimensions 
of the matrix X. 


co 
rs 
my 
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We present an illustration of each of these types of symbolic matrix derivatives 
in order to clarify the concepts. Thus if 


3 —4 
x 22 3x 
Y = 
s ; 
€ sin x log. x |, 
we have 
» 2 —5 | 
aY 1 62 —12z 
- —] 
ox e cos x x . 
while if y = 2n%s2 — 22% and 
U1 «2 
X = X21 X22 


X31 «=—-X32 ; 


we have 


32 — £31 
dy 
.= 0 0 : 
ax 
— 22 U1 


Suppose Y is any matrix product involving X, X’ and independent matrices, 


for example, Y = AXBX’C. We may fix an element z,,, of X and form the 
matrix 

ay 
(4) —s 

OL mn 


or we may fix an elemenx pq of Y and form the matrix 


IY pq 


(5) ay” 


The purpose of this paper is to devise a systematic method for calculating these 
matrices, and to give various applications in the general field of statistics. 
By way of introduction we take the matrix product Y = AX where 
X11 X12 
Qi Qo ay 


A= and X = Xo Loe |, 
a} 20 a3 


so that 


Qy, Lin + Gye Yor + O13 X31 An ig. + Ary Xoo + Aig X32 


to 


G21 Zy + eg Xa) + 3 X31 Qa X12 + Aye X22 + Ae3 X30 





eS, 
he 


SC 





We have then 


oY a1 
OX 11 A>, 
ay — | ae 
0X21 7 An. 


oY A138 
das. O23 


0 
01, 
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ay 0 
dr =| 0 
ay 0 
a te 


ay _ 0 
0.032 0 


These six equations can be combined in the single one 


(6) 


ay 


a 


= Eb an 


where J mn is @ matrix having dimensions of X, with all elements zero except for 


a unit element in the m-th row and n-th column. 


ai 
O"n 
yu S. Ay 
ax 

a3 

a2, 
OY21 
ye ae 22 
ax 

23 


0 
OY 12 
. = 0 
OX 
0 
0 
OY22 
ee = 0 
aX 
0 


These four equations can be combined in the single one 


(7) 


OY pa _— ACK 


ax 


Pa) 


Similarly we find 


ay 


ae ’ 


where K pq is the matrix having the dimensions of Y with all elements zero except 
for a unit element in the p-th row and q-th column. 
It should be noted that the matrices on the left of (6) and (7) are matrices com- 


0 
posed of the basic elements ™ — 


mn 


Other types of symbolic matrix derivatives could be defined and studied. We 
have selected these two main types because of their application to regression and 


correlation theory. 


The second type is more specifically indicated in the ap- 


plications but the relations between the types are such that a simultaneous treat- 
ment seems appropriate. 


2. Notation. 


scalars. 


Capital letters are used for matrices and small letters for 
It is understood that Y, U, V, --- are matrices whose elements are 
functions of the elements xm, of X and that A, B, --- 


(unless otherwise stated) 
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are matrices whose elements are not functions of x,,,. In the development of 
the formulas it is understood that the differentiation is carried out with respect 
tO Xmn or X. The matrix function differentiated is called Y. 

We have already defined J», as the matrix having the dimensions of X with 
all elements zero except for a unit element in the m-th row and the n-th column, 
and we define Ky, similarly with respect to Y. We now define J nm as the matrix 
having the dimensions of X’ with all elements zero except for a unit element in 
the n-th row and the m-th column, and we define K,, similarly with respect 


7 


OXmn 





to Y’. All the formulas we obtain for involve Jmn or J cae while all those 


—_s 


OY pq - . - 
for ae involve Kpq or K¢p-. 


3. Differentiation of a constant. If Y = A [ap] we have at once 





Wea _ gy, 
OFmn 
It follows that 
oY 0 
(8) Olen un loa , 
oy 0 
( ae boom on 
9) ox =e | Ypaq 0, 


where the zero matrix of (8) has the dimensions of A, while that of (9) has the 
dimensions of X. 


4. Differentiation of a matrix with respect to itself. If Y = X = [x,,] we note 
that 


OX mn OXinn 





( 
OYra _ IX pq om 1 (p=mq= i 
(0 (otherwise) ts 
It follows that 
oY 0 
OLmn = OLmn [ype] = Ima, 


0 0 > 
. = 2 | Yrq = Ky. 


5. Differentiation of the transpose of a matrix with respect to the matrix. 
Let Y = X’, so that 


(10) 





Ypq = Xap- 
Then 


{1 (q = m, p = n); | 


OYpq _ 


OXqp _ 


Bien OXLmn 10 (otherwise), | 
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and we have 


oY 0 , 
) a ae = Inmy 
(12 — *— [Yypel = J 
, Oy 0 al 
(13) aX = Ba Ypa = Kop, 


/ ~ e ° 
where J nm, Kop are defined as in section 2. 


6. Differentiation of sums and differences of matrices. If 
Y=U+V — W &= [upg + tpg — Wal, 
we have 
Aven _ Ane, Ape _ BtOpe 


OSen San in te’ 
then 
oY O 0 
az. ” az. [yoo] = ate. lupe + Upe — Wpal 
0 0 0 
(14) = ata. [upal + ann at OLmn [wpa] 
_ dU ave _ aw 
OX mn Ban Olan’ 


and similarly 


(oe IY pq DUpq OV pq OW pa 
( - = ; — oe 

- ax ~ ax * ax — ax 

7. General formulas for the differentiation of a two factor matrix product. 
Suppose U is a matrix with ¢ rows and d columns and V is a matrix with d rows 
and e columns, then 


d 
(16) Y = UV = [ype] = Do Ups Yeq- 
s=l1 
We have at once 
a d a d 
- OY p< OUps Ov=; 
(17) —- iow Woe > i, we 
OSan 2, Sa 2X * 82m 


Now considering any fixed x», it is clear that the first term on the right of (17) 


: : pie jai eal 

is the same as the right hand term of (i6) with ; 8 in place of u,,. The second 

term on the right of (17) is likewise the same as the right hand term of (16) with 

Veo . : : 

a in place of v.,. We may then write 

aY —_ . OV 
oe om Oe Re. 

OX mn OXmn OXmn 


(18) 
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Also considering a fixed yp, we have 


(19) Bie ey, « Bin O. 
ax s=1 OX s=1 ax 

It is to be noted that this formula yields matrices of the proper dimensions (those 

of X) since ae and a have the dimensions of X. These matrices, when 

multiplied by the neler values v.¢ and wy, and summed, yield matrices of the 


desired dimensions. 


8. Some properties of matrix products involving /’s and K’s. Before deriving 
formulas for the differentiation of products of specific factors, it seems wise to 
Cerive some formulas exhibiting certain relations involving the J’s and K’s. 
Consider the matrix A having c rows and d columns and the matrix X having d 
rows ande columns. Then Y = AX is a matrix with c rows and e columns, J, 
one with d rows and e columns, J»m one with e rows and d columns, K pq One with 
c rows and e columns and K,,, one with e rows and c columns. 

It is easily seen by actual multiplication that 


(20) AJmnisac X e matrix with all its elements zero except those of its n-th column 
which are those of the m-th column of A. We omit further discussion of the dimen- 
sions of the matrices and assume that whenever a matrix product is written, 
the factors are comformable. Then we can show similarly that 


(21) JmnB is a matrix with all its elements zero except those of its m-th row, which 
are those of the n-th row of B. Similar statements hold if Jm, is replaced by J oes 
or K,, or K,>. The rules are 

(a) When J mn (or J'nm OF Kpq or K al is the postmultiplier, the first subscript 
indicates the column of the other matrix which is placed in the column 
indicated by the second subscript. 

(b) When Jn (or Jnm or Kp, or K al is the premultiplier, the second subscript 
indicates the row of the other matrix which is placed in the row indicated 
by the first subscript. 

Notice also that 


(22) A’K pq is a matrix with all elements zero except those of its q-th column, which 
are those of the p-th column ofA’, or the p-th row of A. A similar result holds if 
Kyq is replaced by Kip or Jmn OF Jnm- 


9. Differentiation of specific two factor products. Let us start with Y = 
AX where the various matrices involved have the dimensions indicated in the 
last section. Application of (18), (8), (10) gives 
(23) ale Seal. «04 Ae © Bde 


OX m n OXm n OXm n 
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while application of (19), (11) on 
¢ IY pa _ OAs ¢ OXsq 
(24) ay : ay we + 2 Op. 
d 
Et Kes 


e=1 


—_ ApiK ig + ApK og + ee + ApaK aq 


ac X e matrix with all elements zero except those of its g-th column 
which are those of the p-th row of A 








= A’'Ky,q by (22). 
Similar treatment of Y = XB yields 
oY ox 0B 
(26) Wet om Se bag = Le Kydes = Koy B’. 
If we treat Y = AX’ in a similar fashion, we get 
. — aoe 
(27) Onn sai Ad wns 
0 ' 
(28) oy = KwA, 
while Y = X’B yields 
(29) ne 
OXmn 
9Y pq ' 
(30) —* BK.p- 


It is to be noted that J always has the subscripts mn, and similarly we find always 
Fncalen cae - We may therefore omit the subscripts on these letters. When 
we do so we shall also write 

_ & a(Y) 


dy 
KL) gy Ure 
aH — a a 


placing brackets { ) around the matrix from which a fixed element is to be 
chosen. Thus if Y = AX, we write instead of (23) and (24) 





oY 
(23a) a(x) = AJ; 
a(Y) ‘ 
(24a) —— = A’K. 


The other results are summarized in lines 1-5 of Table I. 
Examination of (18) and (19) shows that the derivatives of products with 
two variable factors are obtained by adding the results obtained by holding 
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each factor constant while differentiating the other. With this in mind, (23)-(30) 
can be used to obtain the derivatives of double products involving X and X’, 
Thus if Y = XX, we get 


aY _,  &¥) ' 
31 , = JX +7 ——! = KX’ + X’K. 
(31) a(X) + XJ, aX + 
Other double product formulas involving X and X’ are given in Table I. 
TABLE I 
For- | Y oY aY) 
mula | aX) ax 
1 AB | 0 | 0 
3 | AX | AJ | A’K 
3 | XB JB | KB’ 
4 | AX’ AJ’ | K’A 
5 X’B J'B | Bk’ 
6 XX JX + XJ KX’ + X’K 
7 X'X JX + X'S XK' + XK 
8 XX’ JX’ + XJ’ KX + K’'X 
9 X’X' JX’ + X'S’ X’K’ + K’'X’ 
- oY ‘ ay) 
Che formulas for ,, are written down very easily, but those for -*.,’ are 
a(X) ax 
aY axY). . . 
not so easy to write. However the values of ~ and -~.~ in formulas 2-5 of 
a(X) ax 
aie acy) ; a} 
Table I are such that the results for -s—" may be obtained from those for 7 
ax a(X) 


with the use of a few simple rules. They are 

(a) Each J becomes K and each J’ becomes K’. 

(b) The pre (or post) multiplier of J becomes its transpose. 

(c) The pre (or post) multiplier of J’ becomes a post (or pre) multiplier of K’. 
These rules are immediately applicable to the double products. Thus when 
Y = X’X we have 


oY 
= J°X + XJ 
and so ; 
HY) ver. vg 
= XK’ + XK. 


10. Differentiation of three (or more) factor products. Products with three 
factors can be differentiated by the formulas of the last section if two adjacent 
factors are constant. Thus if Y = ABX, we have 

oY acy) 


xX) ~ ABJ, = B’ A’ K. 


as 


en 


nt 
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It is not yet demonstrated that these rules are applicable to the products AXB 
and AX’B. However it can be shown by the general methods indicated earlier 
that if Y = AXB, we obtain 


oY ieee ay) ie J 47 / 
(33) a(X) = AJB, —* A'KB’, 
while if Y = AX’B we have 

ws: , a(Y) - , 


It is now apparent that the rules of the last section apply to situations in which 
there are both pre and post multipliers. 

The general theory for two-factor products is immediately extendable. Thus 
if Y = UVW with yp, = z= +B UpsVsrWrq then the basic element is 





(35) Yea ™ he - ou Usr Wrq + i i Ups Cer Wrq + Zs ps Ups Ver — ’ 


7 OXmn mn 


and the formulas result from treating each factor in turn as the only variable. 
For example if Y = XX’X, we have 


oY , / * ry 
(36) a(X) = JX'X + X)'X 4+ XX’, 
and 
aY) = K(X'X)’ + XK’'X + (XX’')'K 
(37) ox 


= KX'X + XK’X + XX'K. 
The symbolic derivatives of certain triple product matrices are presented in 
Table II. 


The rules are sufficiently general to take care of matrices with more than 
three factors. Thus if Y = A’X’XB, we have 


oY / / , , 
(38) nix) = AI XB + A'X'IB 
and 
(39) ws = XBK’ A’ + XAKB’, 


and in the special case B = A, we get 


oY / / rr J 
(40) a(X) = A'(J'X + X'J)A, 
(41) cl = XA(K' + K)A’. 
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Similarly if Y = X’A’AX, we get 


(42) 
and 
(43) 


For- 


mula 


IO wh = | 


Oo 


24 
25 
26 
27 





ABC 
ABX 
AXC 
XBC 
ABX’ 
AX’C 
X’BC 
AXX 
XBX 
XXC 
AX'X’ 
X'BX’ 
X'X'C 
AX’'X 
X'BX 
X'XC 
AXX’ 
XBX’ 
XX'C 
XXX 
XXX’ 
XX'X 
X'XX 
XX'X’ 
AX 
X'X'X 
X'X'X’ 


oY oo / / r re / 
nix) 7 AYA + X'A'AI, 
_ ds = A'AXK’ + A'AXK. 


oY 
a(X) 


0 

ABJ 

AJC 

JBC 

ABJ' 

AJ’C 

J’BC 
AJX + AXJ 
JBX + XBJ 
JXC + XJC 
AJ'X' + AX'J’ 
J'BX’ + X'BJ’ 
J'X'C + X'I'C 
AJ'X + AX'J 
J'BX + X'BJ 
S'XC + X'IC 
AJX’ + AXJ’ 
JBX' + XBJ' 
JX'C + XJI'C 
JXX + XIX + XXI 
JXX’ + XIX’ + XXJ' 
JTX'X + XIX + XX'S 
JIXX + XIX + X'XS 
JX'X’ + XSI'X' + XX'S’ 
J'XX! + X'SX! + XX’ 
TIX'IX + X'S'X + XX S 
TIXIX! AX S'X! + XIX S' 


Finally if Y = XAX’AX, we get 


(44) 


(45) 


oY 


a(x) 


oY 


a(x) — 


11. Vector results. 


is a general result. 


TABLE II 





a{Y) 

“OX 

0 

B’A'K 

A'KC’ 

KC’B’ 

K'AB 

CK’‘A 

BCK’ 
A'KX' + X'A’'K 
KX’B’ + B’X'K 
KC’X’ + X'KC’ 
X'K’A + K’AX’ 
BX'K’ + K’'X'B 
X’'CK’ + CK’X’ 
XK'A + XA'K 
BXK’ + B’XK 
XCK’' + XKC’ 
A'KX + K’AX 
KXB’ + K'XB 
KC’X + CK’X 
KX'X! + X'KX' + X'X'RK 
KXX’' + X’'KX + K'XX 
KX'X + XK'X + XX'K 
XXK’ + XKX’' + X’XK 
KXX + X’K'X + K'XX’ 
XX'K’ + XKX + K'X'X 
X'XK’ + XK'X’ + XXK 
X'X'K' + X'K'X' + K’X'X’ 


JAX'AX + XAJ'AX + XAX’'AJ, 


KX'A'XA’' + AXK' XA + A’ XA' X’'K. 


It should be emphasized that each of the above results 


More specific results may be obtained in case one (or more) 


it tT CPR. yA 
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of the matrices is a vector. For example if X, is a column matrix and 
Y = X.BX. , then Y is a scalar, so K and K’ are both unity and we have from 
Table II (15) 


(46) a = BX, + BX. = (B+ B)X. 
If in addition B is symmetric, B’ = B and we have 

KY) _ 

+7 2BX,., 


which is the result indicated in (8). 


12. Differentiation of the inverse of X. It is possible to use implicit differen- 
ax? a(x") 


— Ba ee y Na a8 ee ><] , 
tiation to derive formulas for 3(X) and yaa We write J = XX and get 
al 1 ax” 
= (0 =, : 
a(X) a +S oa 
so that 
. ax _ — 
(47) a(x) = x @n 6 
whence 
i 
(48) a = —(X")K(X")’. 


The formula (47) is a generalization of a known matrix differential formula 





(3 :3.4]. 
In a similar way we derive 
a(x’) Ne ci ae y\-1 
(49) a(x) — (X’) J (X’) ? 
a((X")™) ee ait BO , 4\—1 
(50) — (x) RUE. 


13. Differentiation of a function of a function. The theory developed in the 
earlier sections is sufficiently general to be useful in differentiating a function of 
a function if the functions involve addition, subtraction, premultiplication, post- 
multiplication, and inverse. For example if 


(51) Y=2'Z with Z=AX 
we have 
oY oZ' , OZ 


Z+ 2’ 


BK) a(x)“ 


a(x)’ 
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and since 
oZ’ OZ 
= JA! , ee JAS 
a(X) anh, oa A 
(52) wu = JA'Z + DAS, 
and thence 
(53) wo = A'ZK' + A'ZK. 


These results are equivalent to those of (42) and (43). 


14. Differentiation of a power of a square matrix. The values of the sym- 
bolic derivatives of X°, X° with respect to X are given in T; bles I and II. It can 
be shown similarly that if n is a positive integer 





~ ax" n—1 < 3 n—s—1 n—l 
4 = = JX X’d X a a, 
(54) a(X) Jx™* 4+ a J + 
and this can be written as 
ax” n—1 oe 
55) aa = 2, XIX" 
(55, ax) = & 2 
if we adopt the convention that X°isI. It follows at once that 
n—1 

(56) OX") SP x KX, 

ox s=0 


It is thence possible to derive formulas for the symbolic derivatives of X ”. 
Since XY "X" = I, we have 


~ ax” n n < 8 a 
D ; as a — 0 
(57) a(x) ~ + X~ | xx ‘ 
so 
ax" —n < 8 ~~ 7—n 
8 -= —X x°sx x 
(58) oe | Fi | 
and 
| n—1 
(59) — = —x" |= (77 Kx | x. 
s=0 


15. Applications. We consider the classical theory of least squares, a matrix 
presentation of which is available in [2]. Suppose that y and 2; are measured 
from their means and that y is to be estimated from the n variables x;. Form 
the values of y into a column matrix Y and the values of x; into an N by n matrix 
X. Introduce the column matrix B of n parameters b; and define 


(60) E = Y — XB. 


a rn 
secre aces TT LT 





an 
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Note that the matrix E’E is in this case the single element matrix which is the 
sum of the squares of the residuals. Following the least squares method we 
minimize this by differentiating with respect to the elements of B. We first 
note that 


(61) E'E = (¥’ — B'X’)(Y — XB) 
Y’Y — Y’XB — B'X'Y + B’X’XB 


ll 


Then we write down first 


0(E’E) faba . ‘ - 

x eee eT a J'X’) 'xX’X, 
(62) a(B) as J’ X'Y + J'X' XB + B’X'X/, 
from which we get 

OB'E) _ _yryK — X' YK’ 4 X'XBK’ 4+ X’/XBK 
(63) 0B 


= —-X'(Y — XB)(K + K’) = —X’E(K + K’). 


The J’s and K’s are associated with B and E’E respectively. Here E’E is scalar 
so that K = K’ = 1 and we have 


O(E’E) _ 


oB —2X’E. 


(64) 
The equation X’E = 0, obtained by equating the right hand side of (64) to zero, 
is a statement of the normal equations in matrix form. 

Equation (64) may also be obtained with the use of the methods of section 
13. In this case 


aE . aE’ ities 
=e a: 
and we have 
- O(E’E) _ dE’ ~~ so w 
sO 
(66) = = —NX’EK’ — X’EK = —X’E(K' + K). 


The equation (64) is also applicable to the more general problem in which 
y; and y2 are estimated from the same set of variables z;. The only change 
needed is to regard Y, B, E as two-column matrices so that E’E is a matrix with 
two rows and columns which we denote by 


€i1 = €12 
€21 €22 
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€11 0€22 


We require aa = 0 and _—* 0. From equation (63), inserting subscripts, 
we get 
_ = —X’E(Ku + Ki) 
dB 
= —2X’EKy; 
2 = —2X’EKn. 


It is easily seen that an = a = 0 is equivalent to X’E = 0, the same equation 
as we obtained in the last paragraph. We also arrive at the incidental result that 
in minimizing De; , and Se, separately we find at the same time a stationary 
value of Dee. 

In this way we can treat two or more simultaneous regression problems with 
this general notation as easily as we can treat one. 

As a second application of the theory we outline the initial steps in the direc- 
tion of the formulas for canonical correlation [4], [5]. In this case A and B are 
unknown column vectors with X and Y known rectangular matrices. Then 
XA is a column matrix: 

ly 


L lw 

whose elements /; may be regarded as observed values of a linear form /. Simi- 
larly YB = A, a column matrix whose elements may be replaced as observed 
values of alinearform. It is desired to find A and B such that / and A may have 
the largest correlation coefficient, and to find the size of this coefficient. Then 
A’'X’X A, B’Y'YB, and B’Y’XA = A’X’YB are scalars, and 

" B’Y'XA 
? = (aX XA)(B’Y’ YB)’ 
If the scales of X and Y are chosen so that A’X’XA = 1 and B’Y’YB = 1, we 
have 


(68) p = B’Y'XA = A'X’'YB. 


Using Lagrange multipliers we set 


(67) 


(69) @ = B’Y'XA + 5 (1 — A’X'XA) + ; (1 — B’Y’ YB), 


and differentiate with respect to the elements of A and B. We first differentiate 
¢ with respect to A after replacing B’Y’XA by A’X’YB: 


rs 








1i- 


ve 
en 
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(70) 54) = J'X'YB = $(s'X'XA + ATX’ XS); 

(71) = = X’YBK' ~ $(X'XAK’ + X’XAK). 

(The J’s and K’s are associated with A and ¢ respectively). We set = =0 
with K = K’ = 1 to get 

(72) X'YB = cX'XA, 

whence by (57) 

(73) p = A'X'YB = cA'X'XA =, 

and 

(74) X'YB = pX’XA. 


Similar differentiation with respect to B gives p = d and 

(75) Y’XA = pY’YB. 

The further steps in the development of canonical correlation theory are based 
on (74) and (75). 

A third application is to orthogonal regression. The situation is very similar 
to that of the first illustration, but the errors are measured orthogonal to the 
plane of best fit. As before we take the variates as measured from their means 
and so have the basic equation 
bidt1 + bot, +--+ + OE 

Voi + be + +++ +b 
This can be written as 


(77) D = hay + lx, + +++ + La, = XL with L’'L = 1. 
It follows that the quantity to be minimized is 

(78) D'D = L’'X'XL. 

With the use of Lagrange multipliers we have 

(79) @ = L’X'’XL + X(1 — L'L) 

so that 

(80) 5} = J'X'XL + L/X'XJ — \MJ'L + LJ), 
(81) = = X'XLK’ + X’XLK — LK’ + LK) 


from which 
(82) QxX’XL — 2L 


I 
o 
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and the values can be determined from the equation 
(83) (X’X — A)L = 0. 
The solution continues with the use of the characteristic equation. 
It is to be noted from (79) and (82) that 
D'D = L'X'XL = dL’L = 
so that (83) becomes 
(84) (X’X — D’D)L = 0. 


A fourth illustration uses symbolic derivatives in obtaining the principal com - 
ponents of a total variance [5,252]. The variable portion of the exponent of the 
multivariate normal can be written Y’AY where Y is the column vector 
[yi,--:, ys) and A isak by k matrix. We set this equal to a constant, say C, 
and get the equation of the / dimensional ellipsoid. It is desired to locate the 
extrema of this ellipsoid. To do this we find the extrema of Y’Y. Using the 
Lagrange multiplier we have 


(85) @= Y’Y+XC — Y'AY) 
so that 
: 0d 5 s s . 
( = J'} ‘I — rNJ'AY Y’AJ), 
(86) a(Y) J + Y'J (J’AY + 1/) 
a(¢) 


(87) = YK’ + YK — NAYK’ + AYR), 


so that there results 

(88) Y — \AAY = 0. 
Pre-multiplying by A‘ we get 

(89) (A — AY =0 

and pre-multiplying by Y’ gives the important relation 

(90) Y’Y = XC. 


A fifth illustration utilizes symbolic differentiation in developing the theory 
of the linear discriminant function [6, 341] [8, 124]. As in the other illustrations, 
the variates are measured about their means. The unknown multipliers are 
indicated by the vector L. Then 


(91) Z= XL 

is the general matrix equation while 

(92) Z, = XiL 
Zo = XoL 
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are the corresponding equations for the two groups. Then 

(93) } = XL, i = XL, and Za = Ze = (X, = X.)L = DL, 
2) — Z, = (X, —_ p ATA = Y, L, 

(94) ' a ; a 
Zs — Zs = (Xo _ X2)L = Y2 L. 


The within group variation, L/Y:Y,L + L’/Y:Y2L, is then divided into the 
between group variation, L’D’DL, to get 

i i ae ccs agi 
(95) =LYIVIL+LYiY,L~ B’ 








We wish to maximize G. Since A and B are scalars = = 0 reduces to 
0(B) _ 1 aA) 
aL G aL 


which becomes, with further differentiation 


(96) 


(97) (7.11 + YO « (2%). 
G 
Since = is a scalar, weshave 


(98) (Yi: ¥i + Y2¥2)L = cD. 


Any convenient value of c can be used for purposes of discrimination. It is 
customary to take c = 1 and then to adjust (98) so that some I; is unity. 

A final illustration applies symbolic matrix differentiation to a theorem of 
multiple factor analysis. This presentation parallels that given by Thurstone 
(7,473-477] for transforming any factorial matrix into a principal axes matrix. 
The matrix 


(99) F = [ai\) 
has p rows and r columns, r < p, such that 
(100) FF’ =R 


where R isa p X p correlation matrix. 

It is desired to apply the unitary orthogonal transformation L to F in such a 
way as to produce a matrix, called F,, which has the sums of the squares in 
respective columns a maximum. This can be done by maximizing simultane- 
ously the diagonal terms of Fy, F, where 


(101) F, = FL. 
Again using Lagrange multipliers, we have 
(102) @ = L'F’FL + XU — L’L). 
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This equation has the same analytical form as (79). Differentiation leads to 
the result 


(103) (F’'F — )L = 0. 


The solution of (103) gives the value Z which can be substituted in (101) to 
obtain F,. 


14. Conclusion. Two types of symbolic matrix derivatives have been de- 
fined. Laws have been developed for the basic operations of addition, sub- 
traction, multiplication, inverse, and powers. Laws for more extended func- 
tions can be worked out on the basis of principles enunciated. 

Applications are given to certain multivariate problems. It is our thesis that 
with these differentiation formulas available, much work in multivariate analysis 
can be carried on with a simple matrix notation. 
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ON THE LIMITING DISTRIBUTIONS OF ESTIMATES BASED ON 
SAMPLES FROM FINITE UNIVERSES’ 


By Witiiam G. Mapow 
Institute of Statistics, University of North Carolina 


1. Summary. The paper shows that under very broad conditions the usual 
theorems concerning the limiting distributions of estimates hold for estimates 
based on samples selected from finite universes, at random without replacement. 
It may be remarked that under the same conditions, the same conclusions are 
true for random sampling from finite universes with replacement, if the universes 
are permitted to change within the limitations set by condition W. 


2. Introduction. It has long been known that the limiting distribution of 
arithmetic means of samples selected at random with replacement from finite 
universes, or from infinite universes is normal under very general conditions. 
When, however, a sample is selected from a finite universe without replacement, 
and the size of the sample as compared with that of the universe is too large for 
the universe to be treated as infinite, the proof that the limiting distribution of 
the mean is normal appears to have been given only for the case where the uni- 
verse is multinomial.” In this paper we prove that the limiting distribution of 
the mean is normal provided only that as the universe increases in size, the higher 
moments do not increase too rapidly as compared with the variance, and. that 
for sufficiently large sizes of sample and population the ratio of size of sample to 
size of universe is bounded away from 1. Various extensions are given, but these 
are almost immediate consequences of the theorem on the limiting distribution 
of the mean. 

The method used is that of showing that the moments of the standardized mean 
tend to those of the normal distribution. In doing this we generalize a theorem 
of Wald and Wolfowitz,’ by making it applicable to permutations of samples 
from finite populations, and by reducing a little the conditions on the coefficients. 
The theorem on the mean is then a simple corollary. 

We also note that with these proofs on limiting distributions we can make the 
corresponding assertions concerning characteristic functions. Although no 
applications of this fact are given, it seems likely that some useful results could 
be obtained. 


3. Preliminary lemmas. In calculating the k-th moments and their limits we 


1 Peescuted to ae American Mathematical Society at a meeting held in New York City 
on April 17, 1948. 

2 See F. N. David, ‘‘Limiting distributions connected with certain methods of sampling 
human populations,’’ Stat. Res. Mem., Vol. 2 (1938), pp. 69-90, especially p. 77. 

3A. Wald and J. Wolfowitz, “Statistical tests based on permutations of the observa- 
tions,’’ Annals of Math. Stat., Vol. 5 (1944), pp. 358-372, especially p. 359. 
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shall use an infrequently given form of the multinomial expansion and some 
properties of symmetric polynomials. In this section we make the necessary 
definitions, and present four lemmas embodying the results we shall use.* 


A t-partition of a positive integer / consists of ¢ positive integers a, --- , a, 
such that a, + --- +a =k. Two t-partitions a ,--- ,a, and 6,,--- , 8, of 
k will be said to be distinct if for at least one value of h we have a, ¥ 8, . 

Let o(a , --- , a), written g(a), be any function of the ¢-partitions of k. By 
Yue(a) we shall mean the summation of g(a: , --- , a) over all distinct t-parti- 
tions of k. 


By 22e(a) we shall mean the summation of g(a) over all distinct permutations 
of a1, +++ , &. 
-By 23:e(a) we shall mean the summation of g(a) over all distinct ¢ partitions 


of k satisfying the condition a, > a, > +--+: > a. 

Let ¥(m,---, ¥) be any function of the variables »,,---,. Then by 
LanW(1, -** , »%) We shall mean the summation of ¥( , --- , v:) over all possible 
selections of ¢ integers from | to n arranged so that 1 > m2 >-+:: >». 


The formula for the multinomial given below is not presented as a new result. 
It is given only as a means of referring to the result we need. 

Lemma 1. Let &,--- , & be any quantities or random variables and let k be a 
positive integer. Then 


k 
Is Z. . k 
f; + were > En) ai g att Cy -+ng Lan Es, a Eres 


t=] 
where 
! 
1h k! 
neti = Sa 


alee: a! 


The proof is omitted. 
. The following lemma will be useful in connection with several of the results of 
this section: 
Lemma 2. If g(a) is a function of the t-partitions of I, then 


Lue(a) = VsZxp(a). 


The verbalization of the lemma is practically its proof. 

Let us now define certain symmetric polynomials that we shall use. 

Let Sa,,....a, = Dé! --- &* where the a’s are positive integers and the sum- 
mation extends over all possible arrangements »,--- , » of t of the integers 
1,---,N. Hence there will be VN“ = N(N — 1) --- (N — ¢ + L) terms in 
iis: tila 


Lemma 3. Suppose that t,, +--+ , t, are an h partition of t, that 


Mie Hg_ tl = 88 = Aye gts, (= 1,---, hy =O), 
4 The order of sections 3 and 4 is largely a matter of taste; some may prefer to treat sec- 
tion 3 as an appendix to section 4 to be referred to when necessary. 
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and that 


a, ~ Qi,+1 FE o++ ff Ot, +-++4+t,_141 
Then, defining 
(3.1) Saya, = UaBanget --- Eri, 
it follows that 
ii icity A Bes 


To prove Lemma 8, it is only necessary to note that each term of S4,.....a: Will 
determine ¢,: --- t,! equal terms of Sa,,..-,, - 

Although the moments that we shall obtain will be functions of Sq,,...,, , the 
condition that we shall use on the moments can be interpreted directly only ‘in 
terms of S,. Consequently, in order to be able to analyze the implications of 
that condition on Sq,,...,2, , We state the following lemma: 

LemMaA +. The symmetric polynomial Sa,,....2, 18 equal to a sum of products of 
the form 


v vy Y 
+Sy,Sy.5 eee B,, 


where y1,°** ,¥, are an h-partition of k,h < t, and each y is a sum of one or more 
of the a’s. Furthermore, if S; = 0, then h < [k/2] where [k/2] = k/2 if k is even 
and |k/2] = (k — 1)/2if k is odd. This follows from the result 


(3.2) a. ee _ is oot + Dectiiiee dies + eae + tia ie nlite bi . 


Proor: It is easy to prove (3.2) by comparing terms. ‘Then the other asser- 
tions follow from the repeated use of (3.2) and the resulting fact that each y is a 
sum of one or more of the a’s. 


4, The limiting distribution. In this section we obtain the generalization of 
the theorem of Wald and Wolfowitz to which reference was made above. 

Let U,, U2,---, Un, +++ be a sequence of universes, the universe Uy con- 
taining the elements’ x,y and let the arithmetic mean of the elements of Uy be 
denoted by 7y. Furthermore, let 


a 1 =. 
Lr = u(U vy) = (5) 7 (ayy — ky). 


Let C,; , C2, +++ ,Cn, +++ be a sequence of sets of coefficients, the set C,, con- 
taining the elements c;, and let the arithmetic mean of the elements of C, be 
denoted by é,. Weexclude the possibility that the elements of any C, all vanish, 
and hence we can suppose that ZL c,; = 1. Furthermore, let 

7 

5 The letter v will assume all integral values from 1 to N. The letter r will assume all 
positive integral values. The letter j will assume all integral valuesfromlton. The letter 
t will assume all integral values from 1 tok. The symbol lim will stand for the limit as n or 
N or both, as the case may be, increase without limit, it being understood that lim n/N < 1. 
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, 171m" 1 r 
Kr = ur(Cn) so (‘) 2. Cin - 
nu j 





Since Do(cin — z,)° > 0, it follows that, if we define A, = n'@, , then A% <1, 
d 


Let n elements be selected at random without replacement from L’y and let 
us denote these elements by xn, the subscript j indicating the order of selection, 
i.e., 2., is the i-th element of Uy selected for the sample even though it may be | 
Unn.- 

The linear function that we shall study is 

| a“ Cint iw + sh + Cun Benn ’ 
i.e., the value of z, is determined by multiplying the j-th element selected for the 
sample by c;, and summing for j. Then, since Exin = Zy , we have 


Ez, = Nnyep. 


2 N n 9 
es, = (y _ :) Mon ¢ a N As) . 


To see this we first note that 


Furthermore, 
n 
3.3 
, Can Cin = nr 6, — 1, 
i¥j=1 
= 2 
E(zin — En) = fen; 
and, if 7 + 3, 
’ oe , . 1 
E(2in = En) (x jn a Ey) = (ey oo we 
From the definition of variance we have 
n 
2 2 , a , = 
Cz, = E(2n wii Ez,) = 2. Cin Cin E(x in — Ew) (2 jn i Ey), 
t,jael 


and making the indicated substitutions the result follows from a few simple 
manipulations. 

If we define Z,, to be the arithmetic mean of xy , --- , t.», then it follows that 
4/n Cj, = 1 and, as is well known, 


Ei, =i 


2 N — n\ pon 
oz, =| — ; 
, N-1l1/ n 
Hence, if we can find the limiting distribution of 
£8 


Oz, 


then the limiting distribution of ( — #)/oz will be a special case. 





let 
on, 


he 


Lt 
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We shall need to place some sort of limitation on the sequences Uy and C, if 
we are to obtain theorems on limiting distributions of statistics based on them. 

The condition W that we shall use is satisfied by a slightly larger class of se- 
quences Uy and C, than that of Wald and Wolfowitz because it does not rule out 
the possibility that all the elements of C, should be equal. It should be noted, 
however, that for their purposes this extension of the class of sequences satisfying 
Uy and C, is vacuous since they required n = N, so that in their case if all the 
elements of C,, were equal, say k/N, we would have zy = k <y no matter in what 
order the elements of Uy were selected for the sample. 

ConpiTIon W. The sequence Uy and C, will satisfy the condition W if 





/2 
Mrye = Mon AN), 
’ —/2.? 
Hrn = 1 . A,(n), 
9 
nA; 
and Vv <l-e. 


for sufficiently large n and N, where a finite value \ exists such that for all r 
sup | A,(N) | <A, 


sup | A,(n) | <A, 

ande > 0. 
(Note that if W is satisfied for all even values of r then W is also satisfied for all 
odd values of r since p,424, > 41). 

A general theorem on moments is the following: 

THEOREM 1. Let Sa,,...,c, and a be defined in terms of ty -- Zy 
instead of £, and let T’s,,...,0, be the same function of the cjn that Sa,,-.-,«, 18 of the 
t,y — ty. Furthermore, let E, = EZ. Then 


cies 
(4.1) Ex = dds Seem WV ot : 


Proor: From the definition of Z, and Lemma 1, it follows that 
of Ey = Dodo Chy.--0y Doin Coin...08'n E(a,w — Ev)" + + (24,4 — Ey)". 
t 


Since we are selecting at random without replacement it follows that 
(t) , = , se \e 
N'’E(a,,~ — En)" +++ (yn — En)! = Saye--ey « 
If we now use Lemma 2 to replace 2 by 23:22 , we then obtain 


k xzr(t) vk ’ ay at 
Cz, N E. = » » 3t C ay ay nn » se » ia Cyn °° * Cyrn y 
t 


since both C’s,,...,2, and Sq,,...,«, are invariant under permutations of a, --- , 
a,. Then from (3.1) and the definition of T2,,..-,«,, it follows that (4.1) is 
proved. 
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Our fundamental theorem is: 
THEOREM 2. If the sequences Un and C, satisfy the condition W, then 
lim E5;., = 0, 
and 


. 1 _ (2)! 


so that, for any a, 


ee, 
lim P{Z, <a} = Jal ec ar. 
T J—o 


Proor: We wish to show that lim E;, exists and has the values given above 
First consider the parts of the typical term of #, that depend on n and N, i.e., 
the expression 


/ 


y 


B = -/9 Eo ° jo ° 
N® ush(N/N — 1)*? (1 — nA/N)? 


Since lim £;, will be the sum of the limits of a finite number of these terms, let 
us first determine under what conditions B will tend to zero as n and N become 
infinite. 

From Lemma 4 it follows that 


Beis «>+* sha aS Z + Sy, Sy. eee Ss 


’ 
Yh? 


where yi: + +--+: ty, = ai+--- +a, and each of the y’s is the sum of one or more 
of the a’s. From the definition of Sq,,...,2, in terms of x,y — Zy it follows that 
S; = 0. Hence the minimum value of all y’s in any non-vanishing term of the 
summation is 2. Consequently we can say that for all non-vanishing terms h < 
[k/2]} andh <t. Finally if condition W is satisfied then 
’ ’ rh k/2 r) 

Sy “<3 Sy, = N bon An(N) 

where 
sup | (VN) | < x". 


Similarly 


/ 


Fin ORES Be. 


where it may be that 7; + 0 so that we cannot require g < [k/2] for the term 


T,, --: T,, to be non-vanishing. We still have, however, from Lemma 4 that 
gst. 
If condition VW is satisfied, then 
fT. T, = no *?y'(n), 











— —— Se 
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where 


sup | A,(n) | < X%. 
Hence, from Lemma 4, the definitions of yj; and 4 ;, and condition W it follows 
that B is a sum (the number of terms does not depend on n or N) of terms like 
ee eee 
N''(N/N — 1)" (1 — nA; /NY”’ 
where 
h<s[k/2]|, het gt, 
and 
sup | (N)| < @, 
and sup | \’(n) | < . 


Since h < 1, it follows that if g < k/2 then lim D = 0. Hence, a possibly non- 
vanishing term must have g > /:/2 and hence t > k/2 because t > g. Further- 
more,t > g +h —k/2,sinceh —k/2<Oandt2g. Hencet —h > g — k/2. 
Now, we can write 

o—k/2 


D= Rr X(N, n), 


where 
sup |A(N, n)| < ~, 


since nA*,/N < 1 — e for sufficiently large n and N. 
Hence 


lim D = 0, 
unless, perhaps, when g — k/2 = t —h,ie.,h —k/2=t—g. Sinceh —k/2 < 
0 and t — g > 0, it follows that we must have h = k/2 and t = g for lim D to be 
possibly not zero. 
If k is odd, then h < (k — 1)/2 and hence 
lim Eoj41 = 0, 


since all terms obtained by expanding it as above will tend to zero. 
If /: is even, say k = 27, and lim D is possibly non-vanishing, then h must equal 


j and we must have y; = --- = y; = 2. Consequently, from Lemma 4, the 
only possibly non-vanishing terms of E2; are those arising from the polynomials 
Sar.--.0y 9 Tay-+.e, With a = --- = a, = 2, and a4: = --: = a = 1, 80 that 


2s +t—s = 2jort = 2] — s,s =0,1,---,j. Forsuch values of a, --- , a 
we have 
(23)! 
City 
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Furthermore, as shown below, in developing Sa,,....«, by means of Lemma 4 the 
coefficient of S3 is 

2j — 2s)! 

2-*(j — 8)" 

DEMONSTRATION OF (4.2): If s = j, then it follows from Lemma 4 that the 


coefficient of S} is 1. If s < 7, we use Lemma 4, and noting that S, = 0, we 
obtain 


(4.2) (—1** § 





(4.3) Sar, ocegte = —Sartar, ieee —Bay.---.04~2 ap_1+me » 
where, since a, = 1, we have a; + a = a2 + a =-+: =1,a, + a = 3, and 
Os41 + a = +++ = a1 + a, = 2. Consequently of the ¢ — 1 terms of the 


above evaluation of Sa,,...,«, , exactly s will have a’s > 2andt — s — 1 will be 
of the same form as Sa,....,«, except that instead of s of the a’s being 2 we have 
s + 1 of the a’s equal 2. For each such s we repeat the process obtaining 


Sara = (~ iPr — ei 1)(¢¢t aia ian 3) - S25 .--.2 
i 


+ terms which have h < j. 


Consequently (4.2) provides the coefficient of Si in Sa,,...,2,. Since the other 
terms of Sa,,...,«, have h < j, they lead to terms of E,, that vanish in the limit. 
Furthermore, by Lemma 3, 7'a,,-.-,2, = Ta;.--,a,8!(t — 8)! and the only term 
of T'a,,---,«, for which g = tis 
nr = nt-9/24 ae 
The other terms of 7'a,,...,«, Will lead to terms of E2; that vanish in the limit since 
g <t. Consequently, eliminating terms known to tend to zero as n and N be- 
come infinite, we see that E.; — f(n, N) tends to zero as n and N become infinite, 
where 
‘ 2. (2j)! mn 2j — 2s)! Nin? * Ay?” 
jon, ¥) = ED! ye, Bia Ww . 
s=0 2’* s!(2j — 2s)! .N (i- nA? /N) 


Now as 7 and N become infinite with n < N, we see that 


lim f(n, N) = ahi“ Ei 1)" iG : yi MANNY — nAi,/Ny 
ie 
Diz!’ 
1.€., 
: ~. 
lim E,; = 35-7 


To complete the proof it is only necessary to note that the normal distribution is 
completely determined by its moments.° 





6 See for example, M. G. Kendall, The Advanced Theory of Statistics, Vol. I, London, 


Charles Griffin and Company, page 110. 
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Since Theorem 2 is a generalization of the Theorem of Wald and Wolfowitz, 
it is possible to generalize slightly all the applications they make of their theorem. 
The statements of these generalizations are omitted. 

The application of Theorem 2 that led to this paper is the following: Suppose 
that c;, =n”. Then the sequence C, satisfies W and A, = 1. Consequently 
we have proved 

Coro.uary 1. If the sequence Uy satisfies the condition W and if %, is the arith- 
metic mean of a sample of n elements selected at random without replacement from 
Uy , then, for all a, 


‘ ni2(Z,, —_ Ew) : ) © i 
] P 4/9, eee ae ae = siiienete ee F 
_ ar — m/n)* < a} (3 = : es, 


provided that « > 0 exists such thatn/N <1 — «¢, if nand N are sufficiently large. 

Now the sequence of Uy will certainly satisfy W if Uy has the same moments 
for all values ci N, or if the moments of Uy tend to fixed values as N increases, 
or if the universe Uy is a random sample of a universe having these properties. 
Consequently Theorem 1 and its corollaries will be valid for many applications, 
among them being the case studied by F. N. David’ when Uy has the same multi- 
nomial distribution for each value of N. 

The condition W is immediately satisfied for large classes of changing uni- 
verses. For example, if the elements of all Uy are uniformly bounded and 


lim Mon ao 0, 


then the condition W is satisfied. As an illustration, consider the case where 
Uy contains Npy elements having the value one and N(1 — pw) elements having 
the value zero. Then 


bon = pr(1 — Pw); 





and 
1 NPy N ; 
wot Een ete Bp. 
i r=1 t=Npytl 
= pw(l — pw)’ + (—1)'(1 — py)py. 
Hence 
Mrn (1 — px)” r py 
Pe (1 —_ 
pee Dn’ , (1 — pw)” 


so that condition W will be satisfied if « > 0 exists such that « < py < 1 — « 
for all sufficiently large N. 

Hence the limiting distribution of Z, will be normal no matter how the propor- 
tions py change provided only that the universe Uy does not come to consist 
essentially only of zeros or only of ones. 





7 Op. cit. 








544 WILLIAM G. MADOW 


Various multivariate extensions of Theorem 2 are immediate. For example: 

THEOREM 3. Suppose that the elements of Uw are vectors of two components, 
(X11, L»w2), and that the condition W is satisfied by the sequences C,,, Umi, and 
Une where Un, , h = 1, 2, contains the elements x yy), . 





Let 
, 
Znk = Du jn jnd . 
7 
and let 
r 2nh — Ez h 
a = —__ ’ 
enh 


° , , 
where the random variables x jn, are defined as were X jn . 
Let 


> (a1 a Emi) (owe brane tyz) 
(uen1° Mewe)/? 





PS = 


and suppose that lim py exists and is equal to p where p > —1 + €. Then, the 
limiting distribution of Zn and Z,2 is bivariate normal with means 0, variances 1, 
and correlation coefficient p. 


Proor: To prove Theorem 3 we shall show that any linear function 
Zn + teZno will be normally distributed in the limit if 4; and ¢ are not both 
zero. It will then follow® that the theorem is true. 

If we define Uy to be the sequence whoseelements are 


— b(t — Em) | b(@ow2 — Eve) 


Ion = 1/2 : 1/2 ’ 
Men Men2 


‘ ‘ a 
then the arithmetic mean of Uy is zero. Let 
a af 
in = Zz Cin in , 
2 
and let 


to ZS, 


Then, it is readily verified that 
s iZni . toZno 


Fs 


n 
Ot ZnittoZne 


8 The generalization holds for any finite number of components but, to simplfy the dis- 
cussion, is stated for two components only. The method used is due to H. Cramér, Random 
Variables and Probability Distributions, Cambridge University Press, London, 1937, p. 105. 

°H. Cramér, Random Variables and Probability Distributions, Cambridge University 
Press, London, 1937, p. 105. 
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Consequently, to prove that 4Z,1 + t2:Z,2 has a normal limiting distribution, we 
need to verify that the sequence Uy satisfies the condition W if Uy; and Uy» do. 
The moments of Uy are 
‘ l - 
jw = — y Lyn y 
Ns i 
so that 
Bow = ty + te + 2htpr, 


where py has the usual form of the correlation coefficient. Furthermore, using 
the binomial expansion, we have 


Tr t ts “p 
’ A vr © 02 a,r—aNn 
(4.4) ten = Ze Ca a/2 (r—a)/2 ? 
— Meni Hone 
where 


1 ; a 
a 2 a (ti — Eni)"(tyw2 — Eye)”. 
NS 
Then, by the Cauchy-Schwarz inequality we have 
| (tm — Em)*(te — Exo)" * | 
v 
= [du (tx1 — fm)" , ZL (tx. — Em)” **]', 
, v 


so that 
| ba,r—an | S$ heen Marte, ’ 
and using condition W for Uy, and Uy2 , we have 
Meat S m2vi(N), — Marzan2 S mane d(N). 
Hence, substituting in (4.4) we see that 
sup | rw | << o, 


Hence the sequence Uy satisfies the condition W for all t; and é , and Theorem 
3 is proved. 

From Theorem 3, it then follows that the theorems on the limiting distribu- 
tions of moments, product moments and functions of moments’ are valid for 
sampling from finite universes, at random without replacement. 


10 The most important of these theorems are given in H. Cramér, Mathematical Methods 
of Statistics, Princeton University Press, Princeton, 1940, sections 28.2-28.4, pp. 364-367. 








A NON-PARAMETRIC TEST OF INDEPEN DENCE! 


By WassiLty HoEFFDING 
Institute of Statistics, University of North Carolina 


1. Summary. A test is proposed for the independence of two random variables 
with continuous distribution function (d.f.). The test is consistent with respect 
to the class 2” of d.f.’s with continuous joint and marginal probability densities 
(p.d.). The test statistic D depends only on the rank order of the observations. 
The mean and variance of D are given and »/n(D — ED) is shown to have a 
normal limiting distribution for any parent distribution. In the case of inde- 
pendence this limiting distribution is degenerate, and nD has a non-normal 
limiting distribution whose characteristic function and cumulants are given. 
The exact distribution of D in the case of independence for samples of size 
n = 5, 6, 7 is tabulated. In the Appendix it is shown that there do not exist 
tests of independence based on ranks which are unbiased on any significance 
level with respect to the class 2’’. It is also shown that if the parent distribution 
belongs to 2” and for some n > 5 the probabilities of the n! rank permutations 
are equal, the random variables are independent. 


2. Introduction. In a non-parametric test of a statistical hypothesis we do 
not make any assumptions about the functional form of the population distribu- 
tion. A general theory of non-parametric tests is not yet developed, and a 
satisfactory definition of ‘‘best’’ non-parametric tests does not seem to be avail- 
able. Desirable properties of a “‘good”’ non-parametric test are unbiasedness and 
consistency. <A test of a hypothesis H) is said to be consistent with respect to a 
specified class of admissible hypotheses if the probability of accepting Hy tends 
to zero with increasing sample size whenever a hypothesis # Hp of this class 
is true. 

In this paper we consider the problem of testing the independence of two 
random variables X, Y on the basis of a random sample of size n. In all that 
follows the d.f. F(x, y) of (X, Y) is assumed to be continuous. We will denote 
by ©’ the class of continuous d.f.’s F(z, y) and by ©” the class of d.i.’s having 
continuous joint and marginal p.d.’s, 


fla, y) = F(x, y)/ax ay, fulz) = [fe, y) dy, fly) = [fe y) ae. 


The hypothesis H, to be tested is that F(z, y) is of the form 
F(x, y) = F(a, ~)F(, y). 


Several tests of this hypothesis have been proposed. Among them those 
deserve particular attention which depend only on the rank order of the obser- 


1 Research under a contract with the Office of Naval Research for development of multi- 
variate statistical theory. 
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vations. They will be referred to as rank tests. The critical region of a rank 
test of independence with respect to the class 2’ is similar to the sample space; 
the rank tests share this property with other tests obtained by the method of 
randomization (ef. Scheffé [1]). A characteristic feature of a rank test is that it 
remains invariant under order preserving transformations of X or Y. 

Rank tests of independence have been studied by Hotelling and Pabst [2], 
Kendall [3] and Wolfowitz [4]. While nothing is yet known about the power of 
the last test, the author [5] has shown that the two former tests are asymptotically 
biased for certain alternatives belonging to 2’. By a slight modification of the 
examples given in [5] it can be shown that these tests are asymptotically biased 
even with respect to the class 2”. 

In the Appendix it is shown that there do not exist rank tests of independence 
which are unbiased on any level of significance with respect to the classes ’ 
orQ”. It will appear from this paper that there do exist rank tests of independ- 
ence which are consistent, and hence asymptotically unbiased, at least with 
respect to 2”. 


3. The Functional A(/’). Given a random sample from a population with a 
d.f. belonging to a class 2, we want to test the hypothesis Hy that F is in a sub- 
class w of 2. It is easy to construct a consistent test of Ho if there exist (a) a 
functional @(F) defined for every F in Q and such that @(F) = 0 if and only if 
F ¢w; and (b) a consistent estimate of 6(F). There are many ways of devising 
by this method consistent tests of independence. The particular test described 
in the sequel has been chosen mainly for its relative simplicity. 

If F(x, y) is a bivariate d_f., let 


D(x, y) = F(z, y) a F(z, a0 )F'( 20, y) 
and 


(3.1) A = AF) = [D%@, y) dFC, 9). 


Here and in the following, when no domain of integration is indicated, the 
(Lebesgue-Stieltjes) integral is extended over the entire space (here R:). 

The random variables X, Y with the d.f. F(z, y) are independent if and only 
if D(x, y) = 0. 

THEOREM 3.1. When F(z, y) belongs toQ’’, A(F) = O7f and only if D(x, y) = 0. 

Proor. Evidently D(z, y) = 0 implies A(F) = 0. 

Now suppose that D(z, y) 4 0. Since F(z, y) is in 2”, the function d(x, y) = 
f(x, y) — filx)fe(y) is continuous. We have 


2 py 
Dts, 4) @ | I hin, 3) die. 
D(x, y) ¥ 0 implies d(x, y) ¥ 0, and since 
I d(x, y) dx dy = 0, 
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there exists a rectangle Q in R. such that d(x, y) > O if (x, y) isin Q. Hence 
D(x, y) ¥ 0 almost everywhere in Q, and f(z, y) > OinQ. Thus 


A(F) > I D*(x, y) f(x, y) dx dy > 0. 
Q 


This completes the proof. 
If F(x, y) is discontinuous, we can have A(F) = Oand D(z, y) 4 0. This is, 
for instance, the case for the distribution 


P{X =0,Y =1} = P{X =1, Y = 0} =}. 


“ 


The question remains open whether A = 0 implies D(z, y) = 0 if F(z, y) is 
continuous or absolutely continuous. 
In Section 7 it will be shown that 


0 < A s B0 


The upper bound a5 is attained when F(x, y) is the (continuous) d.f. of a 
3 : 


random variable (X, Y) such that XY has any continuous d.f. and Y = X (or, 
more generally, Y is a monotone function of XY). 
Let 
\1 ifu = 0, 
Clu) = 4 
ifu < 0, 
(3.2) V(r o aay x3) = C(x = 25) — C(x; ae x3), 


P(t, Yrs 5 Us, Ys) = AWC, Lo, Ts)P (tr, Xs, Ve W(Yr , Yo, Ys WY » Ys» Ys). 


Then we can write 
(3.3) A= | nee | (01, Yr3 °° 5 U5, Ys) dF(ar, yr) --- AP (Xs, ys). 


4. The Statistic D. Let (X,, ¥i),---,(X,, Y.) bea random sample from 
a population with the d.f. F(x, y), n = 5, and let 


] > ; . . 
a" O( Xa: 5 ) apg oO Ng Xa; ’ Yas), 


‘ D=D, = —= > 
(4.1) n(n — 1) --: (n — 4) y 


sv 


where >”’ denotes summation over all a such that 
m1, *-* a: ai Aa,ift AJ, (Q,j-= 1,---, 5) 
Since the number of terms in L” is n(n — 1) --- (mw — 4), we have by (3.3) 
(4.2) ED = A. 


Since in the case of independence ED = 0, D can assume both positive and 
negative values. It will be seen in Section 7 that — sg < D, < ao, the upper 
bound 3/5 being attained for every n, while the minimum of D, apparently in- 
creases with n. 
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The random variable D as defined by (4.1) belongs to the class of U-statisties 
considered by the author [5]. The following properties of D follow immediately 
from the results of that paper: 

I. Let 


— 


° ‘ o ah — i ° o ) 
#21, 415 Ses) a ee Ys) — Ds a ;= (Xa; Yar, "** > Vas» Yas); 


or 


@.(01,.Y15°°° 5 Xe, Yu) = | eee [oa, Yr._5 °°? 5 Lk, Yrs Thay Yer; °° Us, Ys) 
dF (x41, Yr+1) wwe dF (x;, Ys), (k as i, aa 5), 
,.= | eee | (Bila, Yrs °°* 5 ey Ye) — A} dF(m1, m1) --- dF (x., yx). 


Then the variance of D,, is 


: n\' & (5\(n — 5\. 
(4.3) var D, = 4 EC) 5 — tte. 


We have 
256, < nvarD, < 5. 
n var D, is a decreasing function of n, and 


(4.4) lim n var D, = 25%. 

II. By Theorem 7.1, [5], the random variable +/n(D, — A) has a normal limit- 
ing distribution with mean zero and variance 25 {1 . 

It will be seen in section 6 that in the case of independence ¢; = 0, so that 
the normal limiting distribution of ~/nD, is a degenerate one. In this case 
nD, has a non-normal limiting distribution. (See section 8). 


5. Computation of D. From (4.1) and (3.2) we get after reduction 
_A — An — 2)B + (n — 2)(n — 3)C 





(5.1) ”* n(n — 1)(n — 2)(n — 3)(n —4) ’ 
where 
A= Z. Qa(da — 1) balba — 1), 
a=l1 
(5.2) B= > (az — 1)(ba — 1) Co, 
a=l 
C = Deca — 1), 
a=l 
and 








550 WASSILY HOEFFDING 


Qa = 2, C(Xa — Xs) — 1, ba = DC(Ya — Ys) — 1, 
Bak Bunt 
ct. «= 2 C(X, — XJC(Y. ~ ¥,) = 1. 
f=1 

ada + land b, + 1 are the ranks of X, and Y, , respectively. cq is the number 
of sample members (Xz, Yg) for which both Xs < X,. and Ys < Ya. (Since 
F(z, y) is continuous we may assume that X. ~ Xz, and Y, + Yz if a ¥ 8.) 

Thus, to compute D for a given sample we have to determine the numbers 
Ga, ba, Ca for each sample member, calculate A, B, C from (5.2) and insert 
them in (5.1). 


6. The variance of D in the case of independence. Since F(x, ¥) is assumed 
to be continuous, so are F(x, ©) and F(, y). The inequalities x, < x. and 
F(a, ©) < F(z2, ©) are then equivalent unless F(z, ©) = F(z2, ©). The 
same is true of y: < y2and F(«“,y) < F(«,y2). This shows that the function ¢, 
(3.2), does not change its value if z; , y; is replaced by F(z; , ©), F(, yi), except 
perhaps on a set of zero probability. Hence A and D are invariant under the 
transformation 


u = F(z, ~), v= F(o0, y); U = F(X, ~), V = F(, VY). 
In the case of independence we have F(z, y) = uv, and 
i; = [ tee [ fbi(ur, try °° 5 Ue, Ue)}? du, dv, >>> dugdry, 
where 4; is defined as ©, , with x; , y; and F(x; , y:) replaced by ux, v; and u,v; 
respectively. On evaluation of these definite integrals we get 
= 0, 200-30 = 3, 600-30°5; = 35, 
600-30°%, = 2§4, — 120-30°¢, = 12. 
On inserting these values in (4.3) we obtain 
2(n* + 5n — 32) 
In(n — 1)(n — 3)(n — 4)~ 
Another way to determine the coefficients ¢; in the case of independence is to 


compute var D, for n = 5, 6, 7 from the exact distributions given in section 7, 
and lim n’ var D, from the asymptotic distribution of nD,, (section 8). 


n> 


| 
—~ oO 


(6.1) var (30D) = 


7. The exact distribution of D in the case of independence for n = 5, 6, 7. 
Let S = {(a1, 41), --- , (@n, Yn)} be a sample from a population with a continu- 
ous d.f. We may confine ourselves to samples with z; ~ 2; and y; # y, if 
ixj. Let (x, Ya,)s ae, Gee Y3,) be a rearrangement of (x71, 41) 5° °° 5 (ny Yn) 
such that 7; < m2 <--- < a, and y; < Yo <---<y,. The permutation 
Il = (6, ,--- ,8,) of (1, --- ,) will be referred to as the ranking of the sample S. 
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D, depends o:.!y on the ranking of the sample. We shall express this by 
writing D, = D,(M) = D,(6,---, Ba). If (Bo, ae fi...) is a permuta- 
tion of m(< n) of the integers 1, --- , n such that Bi < By < -+- < Bh, 
Dm(Be,+*** » Bam) is defined to be equal to Dn(a:,--+, am). Replacing in 
(4.1) (Xa, Ya) by (a, Ba) we find 


(7.1) D,.(81 goo Bn) = a >’Ds(Ba, oe Bas), 


where 2’ stands for summation over all a such that 1 < a1 < a, < +++ Cas<n. 
Denoting by II” the permutation obtained from II = (6, --- , Bn) by omit- 
ting 8; , we have the recursion formula 


(7.2) nD,(ll) = (n — 5) ¥ Da). 


From (4.1) and (3.2) we obtain 
60D3(B:, «++ , Bs) = ¥(Bs, Bi, Bs)¥(Bs, Be, Bs) + ¥(83, Bi, Bs)¥(Bs, Be, Bs) 
or 
if 6; # 3; 
(7.3) 60D(B:,---, 8s) = 3 if 6; = 3and@i,&%<3 orf, & > 3; 
\-—1 iff; =3 and; <3,6.>3o0r; > 3,& <3. 
We have 


(7.4) D,(Bi eo ne Bn) 


D,(B2, B81, Bs, °°° » Bn) 
D,( , eer » Bn—2 » Bn» Bn-1) — Da(Bn , Ba-1 5 en » Br) 


For n = 5 this follows from (7.3) and for general n from (7.1). 

Also, by the symmetry of D, with respect to x and y, D, does not change its 
value if in the permutation (6; , --- , 8,) the numbers 1, 2 or n — 1, n are inter- 
changed or the permutation is replaced by its inverse. 

In the case of independence all n! rankings have the same probability 1/n!. 
To find the distribution of D, we have to determine the number of rankings 
giving rise to particular values of D, . 

If mn = 5 there are 5! = 120 rankings. Owing to (7.4) we need consider only 
those with 6; < Bs, Bi < Bs, Bi < Bs. Their number is +22 = 15. Among 
them those with 8; # 3 yield D; = 0; this leaves only the three permutations 


(1,2,3,4,5),  (1,4,3,2,5), (1, 5, 3, 2, 4). 
By (7.3) the respective values of 60D; are 2, —1, —1. Thus we have 
P{60D; = 2} = zs, P{60D, = —1} = #5, 
P{60D; = 0} = +3. 
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The distribution of Ds, D;, --- can be obtained in a similar way using the 
relations (7.1) to (7.4). The distribution of D, for n = 5, 6, 7 is given in 
Table I. 

From (7.3) and (7.1) it follows that -—@s5 < D, < 3/5 for n = 5, 6,---. 
The upper bound 3'5 is attained for II = (1, 2, --- ,m) and every n. To judge 
by the cases n = 5, 6, 7, the minimum of D, apparently increases with n. From 
ED, = Ait also follows that A < 45. 


8. The Asymptotic Distribution of nD, in the Case of Independence. 

THeoreM 8.1. Jf F(x, y) = F(x, ©)F(, y) and F(x, ©) and F( «,y) are con- 
tinuous, the random variable nD, + 3g has a limiting distribution whose charac- 
teristic function (c.f.) is 


oo . —43r(k) 
(8.1) g(t) = (: _ 2) 


ken 





where r(k) is the number of divisors of k. 

Note that r(k) is the number of divisors of k including landk. Thus 7(1) = 1, 
7(2) = 2, r(3) = 2, 7(4) = 3, ---. 

The author has not been able to bring the d.f. corresponding to the c.f. g(t) 
into a form suitable for numerical computation. Thus Theorem 8.1 may be 
considered as a preliminary result. For this reason only a brief indication of 
the proof is given here. 

If (X;, Yi), ---, (¥,, Y,) is a random sample from a population with d.f. 
F(x, ©)F(«, y), let nS,(x, y) be the number of sample members (X; , Y;) such 
that X; < 2, Yi < y. S,(2, y) isa d.f. depending on the random sample. If 
we put F(x, y) = S,(x, y) in A(F) as defined by (3.3), we get 

A(S,.) = “ dL ie a $(Xa, ’ Ya, ae aa Xes» Xas)- 
a}= gee 
It is easy to prove that if n{A(S,) — EA(S,)} has a limiting distribution, it is 
the same as that of nD, . 

Now it can be shown that nA(S,) has a limiting distribution with the c.f. (8.1). 
This can be done either analogously to Smirnoff’s [6] derivation of the limiting 
distribution of the goodness of fit statistic w, , or applying von Mises’ [7] general 
results on the asymptotic distribution of a differentiable statistical function. 
Though the latter paper deals only with univariate distributions, its results can 
be extended to the multivariate case. 

By expanding log g(t) in powers of zt we obtain for the j-th cumulant «; 


2° (7 — 1)! pe 
K = ee ~~ — B; i ’ 
(21) "F ™ 
where B,;_; are Bernoulli’s numbers, 
B=% B=, B=ds, Br=¥5,°°: 
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In particular, x; = 3's, and since ED, = 0, the limiting distribution of nA(S,) 
is that of nD, + a. 


9. The D-test of Independence. Given a random sample from a bivariate 
population with continuous d.f., a test for independence can now be carried out 
as follows: 

If a(0 < a < 1) is the desired level of significance, let p, be the smallest number 
satisfying the inequality 


P{Di > pn| F ew} < a, 


where w is the class of d.f.’s of the form F(a, ~)F(«, y). 

Compute D, as shown in section 5. Reject the hypothesis Ho of independence 
if and only if D, > p,. 

For n = 5, 6, 7 the numbers p, can be obtained from Table I. 

From Techebychef’s inequality and (6.1) we have 


a 
Ps 30D, etme # SS Me 
) 3 * 4 sat — 1)(n — 3)(rn - da f a 


Hence 





4 f 2(n? + 5n — 32) 
0. < Af oS OSS 
am \ Yn(n — 1)(n — 3)(m — A)a 
It follows that p, = O(n’). 
If A > 0, we have A — p, > 0 for sufficiently large n. Then 


P{D, > Pn} = P{ | Dn fad A | < a= Pn} > i- (var D,)/(A = Pn) - 


By (4.4) the right hand side tends to 1. 

This, together with Theorem 3.1, shows that the D-test is consistent with 
respect to the class 2”. 

Since P{D, < 0} tends to 0 if A > 0, it is safe not to reject Hy whenever 
D, < 0. An inspection of Table I shows that at least for small this will 
happen in more than one-half of the cases if Ho is true. 


10. Concluding Remarks. It would be interesting to compare the power of 
the D-test with that of other tests with respect to particular alternatives, for 
instance with the product moment correlation test when the population is normal 
with correlation p. A preliminary investigation seems to indicate that for small 
values of | p | and n — ~ the power efficiency of the D-test as compared with the 
product moment correlation testis ratherlow. This result may not be conclusive 
for values of n which are of practical interest. On the other hand, it may be 
expected that a test which is consistent with respect to a large class of alternatives 
will have a lower power with regard to a sub-class of alternatives than a test 
which has optimum properties with respect to this particular sub-class. These 
considerations suggest the problem of selecting from a given class of non-para- 
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metric tests (such as those consistent with respect to 2’’) a test which is most 
powerful with respect to certain parametric alternatives (such as normal dis- 
tributions). 














TABLE I 
The distribution of D, in the case of independence forn = 5, 6, 7. 
n=5 n=7 
____£ | 15P{60D; = z} P{60D; > z}__ z |630P{1260D; = z} P{1260D; > 2} 
aa} 2 1.0000 —11 | 8 1.0000 
0 | 12 0.8667 —8 32 | 0.9873 
2 | 1 0.0667 —7 32 | 0.9365 
m pe es —6 8 | 0.8857 
—5 28 0.8730 
—4 88 0.8286 
—3 | 64 0.6889 
—2 | 56 0.5873 
n = 6 —1| 8 | 0.4984 
xz |90P}180D. = 2} P{180D, > z} 0 | 88 | 0.4857 
aa 2 | 77 | 0.3460 
| 3 | 24 | 0.2238 
_ 4 1.0000 4 | + 0.1857 
welt 28 0.9556 6 | 56 | 0.1794 
07 36 0.6444 8 | 8 | 0.0905 
1 | 16 0.2444 9 4 0.0778 
2 | 0.0667 12 | 24 | 0.0714 
3: 4 0.0556 14 | 2 | 0.0333 
6 | 0.0111 18 | 12 | 0.0302 
24 | 2 | 0.0111 
30 4 | 0.0079 
42 | 1 | 0.0016 
APPENDIX 


A. Equiprobable rankings and independence. Let II,,, (v = 1, 2,--- , n!) 
be the n! possible rankings of samples of size n from a bivariate population with 
continuous d.f. F(z, y) (ef. section 7). 

If F(x, y) = F(x, ~)F(~, y) we have 


(Al) P{ll,,} = 1/n! (» = 1, ---, !) 
for every n. 
Does (Al) for some particular n imply independence? This is not true for 


n = 2. In this case (Al) is equivalent to P{(1, 2)} = 3. If the distribution 
has a p.d. f(z, y), we have 
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P{(1, 2)} = CEE fs v) dudv + [sae aude] se, dra, 


which equals } whenever f(z, y) = f(—z, y). However, we have the following 
theorem: 
THEOREM. If F(z, y) is in Q” and (A1) holds for some n > 5, then 


(A2) F(z, y) = F(z, ~)F(~, y). 


Proor. (4.2) can be written in the form 
(A3) > D,(e»P) {Ta} = A. 
pol 


If (Al) holds, the left hand side of (A3) has the same value as when (A2) is true. 
But in the latter case we have A = 0. Hence (Al) implies A = 0. By Theorem 
3.1 this is sufficient for (A2). The proof is complete. 


B. Non-existence of unbiased rank tests of independence. 

THEeorEM. There do not exist rank tests of independence which are unbiased on 
any significance level with respect to the classes Q’ or 2”. 

Proor: Let II,, have the meaning of Appendix A. Any critical region of a 
rank test of independence is a set Sm = {I],.,,-°-- , In,,} of m rankings. In 


? 
the case of independence P(Sm) = P{IIw ¢€ Sm} = m/n! We may confine 
ourselves to significance levels m/n!, m = 1, 2,---,n! — 1. To prove the 
theorem it is sufficient to show that for every n = 3,-°--, for some 


m(1 <m <n! — 1) and every S,, there exists a d.f. F in bs such that 
P(Sm|F) < m/n!. 
We shall prove the slightly more general proposition that this holds for 
m = 1, 2, 3. 


Let the bivariate distribution A, be such that the probability mass is dis- 
tributed uniformly on then — 1 segments 


a ee k us 
mo a-t-' “s<-F a. n—1’ 
(ik = 1,2,--+,n— 1), 
and is zero in any region not containing a part of these segments. 
Let B,, be the distribution which is uniform on the n — 1 segments 
k— 1 


Te ie 
(B2) 4" S357 


(k = 1,2,++-,n—1), 





and zero elsewhere. 
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The d.f.’s of both A, and B, are continuous, with 
F(z, ~)=F(x~,r)=x Osa). 


Since the probability of (X, Y) lying on any one of the segments (B1) or (B2) 
is ]/(n — 1), the probabilities P(II/A,) and P(II/B,) are easily obtained in 
terms of the multinominal distribution with n — 1 equal prob«bilities. In 
particular, we have 


(B3) P(1, 2, ---,n!As) = 1; P(n,n — 1, +--+, 1|B) = 1, 


Mid --. 414) «ae «4, - R) @ & ~ v(, —s 


(B4) S (. | = ¥ 


P(n,n — 1,-+-,1]A,) = PC, 2,---,n|B,) = 0. 


In general, if IT, is any permutation of 1, --- ,n, we have either P(II, | An) = 0 
or P(II, |B.) = 0. For any II, with P(I,|A,) # 0 contains at least one 
“run up” of 2 or more numbers (a sequence of consecutive numbers 
7,7 + 1,---,2-+ kh) which is not preceded by smaller numbers or followed by 
larger numbers. On the other hand, if a II, with P(IT, | B,) # 0 contains a 
“run up’’, it is either preceded by smaller numbers or followed by larger numbers. 
Hence if P(II,| A.) # 0, then P(II, | B,) = 0. Similarly, P(Il, |B.) + 0 
implies P(II, | An) = 0. 

From (B3) it follows that for any set S,, of m rankings which does not include 





(1, 2,---, mn) or (n, n — 1,---, 1) we have either P(S,| As) = O or 
P(S,, | B.) = 0. Hence we need only consider critical regions containing both 
(1,2, ---,n)and(n,n—1,---,1). Form = 1therearenosuch regions. For 


m = 2thereisjust one. But from (B4) it follows that forn > 2, 
P(1, 2, ---,n|An) + Pin, n — 1, +++, 1! An) 


1" .27 1 Oy". 
~ oe , ae. 
(. _— :) a(. — 1 n! 


Finally, if I, is any permutation other than (1, 2, --- ,) or (n,n — 1, ---, 1), 
we have, by the preceding arguments, either for A, or for B, , 


n—l 
P(, 2, ---,n) + Pa,n—-1,-+-,1)+ Ph) = (. = :) < =. 
This completes the proof for d.f.’s in Q’. To prove the theorem for d.f.’s in 
2” we can replace the distributions A, and B, by distributions A‘, and Bi, having 
continuous joint and marginal densities and such that the probabilities P(II | 7.) 
and P(II | B:) differ as little as we please from P(II | A,) and P(II| B,), respec- 
tively. For instance, A; can be defined by the continuous density 


————— EL, TTT 
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f(z, y) = K(e-—y t+ 2) HOS g-2#8564 2851-6925; 
= Ele — « + 9) f-«Sy~-e0s5QG #24 ygsl-<¢ 
= K(a+y— e) if r+y2>e, 234 ss @ 
= K(2—e—2x-) if ety<s2-e6e22>1—eyr>l-—e; 


= 0 elsewhere, 


where K = 3/(3¢ — 4e°) and 0'< e < }. If € is taken sufficiently small, the 
distribution satisfies the requirements. The details are left to the reader. 

The proof also shows the non-existence of an unbiased rank test of inde- 
pendence for n = 2 and any level of significance (for we need consider only one 
level, 3). It also can be shown that for n = 3, any m = 1,2, --- ,5 and any 
Sm the inequality P(S,,) < m/3! holds for at least one of the distributions 
A,, A3, B., B;. The question remains open whether there exist rank tests of 
independence which are unbiased for some sample sizes n and some significance 
levels m/n! . 
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ON PREDICTION IN STATIONARY TIME SERIES 


By Herman O. A. Wop 
Uppsala University 


Summary. In time series analysis there are two lines of approach, here called 
the functional and the stochastic. In the former case, the given time series is 
interpreted as a mathematical function, in the latter case as a random specimen 
out of a universe of mathematical functions. The close relation between the 
two approaches is in section 2 shown to amount to a genuine isomorphism. 
Considering the problem of prediction from this viewpoint, the author gives in 
sections 3-4 the functional equivalence of his earlier theorem on the decom- 
position of a stationary stochastic process with a discrete time parameter (see [9], 
theorem 7). In section 5 the decomposition theorem is applied to the problem 
of linear prediction. Finally in section 6 a few comments are made. Since 
various aspects of the isomorphism in question are known, this paper might be 
regarded as essentially expository. 


1. Introductory. Let the sequence 


(1) ey Ut-1, Tt, Ter °°’ 
be an empirical time series such that no clear trend is present in the average 
level, in the variance or in any other structural properties of the series which we 
might choose to consider. Such series are usually called stationary. as distinct 
from evolutive, terms which of course are somewhat loose when referring to 
empirical data. We shall consider two approaches in the theoretical analysis of 
stationary series. I% is convenient to allow xz; to be complex; the conjugate 
complex of x; is denoted ; . 

In the functional approach, the sequence (1) is regarded as forming an infinite 
sequence, say {z,}, where ¢ runs from —« to +. To define stationarity, let 
us for any infinite sequence {z,} write 


t2 

(2) M{z.| = lim eK de Zt (4 ~ —~%,b—> +0), 

The limit 1/[z,], which will be called “the average of z,’’, is clearly independent 
of t. It is also seen that a necessary and sufficient condition for M[z;] to exist is 
that the same average should be obtained when t; is kept fixed while  — +, 
and when ¢ is kept fixed while t; ~ — ©. The stationarity of the sequence (1) 
may now be brought out by assumptions of the type that the averages M[z,] and 
M[x1-%1+x] exist, say 


(3) M(z.] = m, M[x1-F14%] = 1 (k = 0, +1, +2, ---). 


In the stochastic (or probabilistic) approach, we introduce an infinite sequence 
of random variables, say 


(4) «e+ , Bea, Sey Sega *** (-«o <ti< +o), 
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or briefly {.}. The sequence {£,} may be regarded as the generalization of the 
notion of multi-dimensional variable, say [§, --- , &.], to an infinite number of 
components ~;. According to a basic theorem by A. Kolmogoroff (see e.g. [9], 
§11), the probability distribution of the sequence {,} may be defined by specify- 


ing for any finite set of variables, say [é, , --- , &,], its multi-dimensional dis- 
tribution function, say 
(5) F(w, a » Un 5h, eer, » ta) — Prob (€s, < m3 °°" » Se, < Un). 


The sequence {é,} thus defined is said to constitute a stochastic process. As is 
sufficient for our purpose, we confine ourselves to the case when the time parame- 
ter ¢ is restricted to discrete values, t = 0, +1, +2,---. 





, 


Fie. 1 


Now in the stochastic approach, the empirical time series (1) is regarded as a 
sample specimen, a realization, of the stochastic process {é,}, just as a point 
[z; , -*+ , 2a] in an n-dimensional space may be regarded as a sample specimen 
of a multidimensional variable [& , --- , ,].__ In line with this interpretation, the 
process {£,} may be regarded as a universe of individual realizations such as (1) 
(see the graph). Taking out a realization at random from this universe, we shall 
have the probability, 


F(u,; 4) = Prob (&, < uw), 


that the value taken on by the realization at the time point f will be <1 ; 
similarly, 


Flu, w;5h, ts) = Prob (&:, <M, Et, < we), 
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is the Joint probability that the values taken on by the realization at é, and 2, 
will be <u; and <u respectively. 

Any expectation referring to the variables (4) may be expressed in terms of the 
distribution functions (5), for instance 


Ble] = [ uduFus),  Bléy-fl = [of wv die Fl, 05 b, b. 


Again interpreting in terms of the universe of realizations, E[£,], say, is the aver- 
age, over this universe, of the value taken by the realizations at the time point ¢. 

The above definition of a stochastic process (4) being perfectly general, we have 
to impose special assumptions if we wish to take into account particular proper- 
ties of the given time series (1). Thus stationarity of the process (4) may be 
defined by assuming that any probability of the type (5) will remain the same 
ift,,--- ,t,isreplaced byt, + t, --- ,t, +¢,wheretisarbitrary. Alternatively, 
and more generally, the stationarity of the sequence (1) may be brought out in 
this approach by assuming that the expectations 


Elé«] = B, Elles: Ee+4)] or 


exist and are independent of ft. 


2. The functional and stochastic approaches are closely related as to problemS 
and results. A typical example is that r, and p, as defined above allow the 
representations 


(6) re = | e” dF(\), pe = | e"™* d@(n), (k = 0, +1, +2, ---), 


where F(A) and (A) are real, bounded and never decreasing functions. We 
shall now show that the parallelism between the two approaches amounts to a 
mathematical isomorphism. On the one hand, we recall that A. Kolmogoroff 
[3], [4] has introduced and studied the notion of a stationary sequence in Hilbert 
space,—let such a sequence be denoted {X,}—, and shown that a stationary 
stochastic process {£,} forms a particular realization of this general, abstract 
{X,}. On the other hand the following elementary lemma shows that another 
realization of {X,} may be formed on the basis of a stationary sequence {2;} 
such as (1). 


Lemma. Let {x,} be a sequence of type (1) which satisfies the conditions (3) but 
as arbitrary in other respects. We write 
(7) {Xe} = +++ Xe, Xe, X41, 6°, 


where x, = {x.}, and X14; 1s obtained from x, by replacing x: by Xi+x% for every lt. 


1 As to r, , see N. Wiener [8], who treats the case of a continuous time parameter lt. 
As to px , see H. Wold [9], p. 66, and A. Kolmogoroff [4], p. 5. 





| 





| 
| 
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For the elements x, , let multiplication by a real or complex constant and addition 
be defined by 


ax, = {ax,}, Xi tye = fart yl, 
and let R be the class formed by all elements of the type 
C—nXt—n + C_n +1Xt—n+1 + st + CoXt ae al ae Ca Xi_n ’ 


where n and c_,,°-*, Cn are arbitrary. Let the inner product (x,, yx) of two 
elements x, = {x:},y. = {y.} in R be defined by 


(ar, ys.) = M[ze-Gil, 


and let R’ be the closure of R. 

Then R’ is a space the dimension of which is denumerable or finite. In the 
former case, R’ satisfies the conditions of a Hilbert space H, in the latter case it can 
be extended to a Hilbert space H. In any case, the relations 


(8) Ux, = X41, —-2o <t<+oa, 


define a unitary transformation U in H. 

The first statement of the theorem is obvious. It is also easily verified that 
R’ satisfies the conditions A-C of an abstract Hilbert space as defined by 
B. v. Sz. Nagy [7]. If R’ is of finite dimension, a suitable extension will make R’ 
satisfy the conditions A—E of a Hilbert space as defined by M. H. Stone [6]. 
The transformation U is clearly unitary; it is also plain that the definition (8) 
of U extends to the whole of H. 

Now since both (4) and (7) are particular realizations of a stationary sequence 
{X,} in Hilbert space, any theorem on such a sequence {X,} will give, as imme- 
diate corollaries, similar theorems on a stationary sequence {z,} of type (1) and 
on a stationary stochastic process {£,}. Generally speaking, the former corol- 
lary will involve averages of one or more functional sequences {2;}, {y:}, --- 
over time t, while the latter will involve averages, for fixed t, over the realizations 
of one or more stochastic processes {£:}, {yz}, °°: . 

Let us consider the following problem of prediction in the light of the iso- 
morphism established: Suppose the data (1) are known up to ¢ — 1, say for 
t—1,t — 2,---,¢— n, what can then be said about xz, , or, more generally, 
about 2.4%? One approach to the problem is to apply harmonic analysis to the 
given data, and to extrapolate the function obtained up to the time point t + k. 
Another approach, the one which we shall consider, is to approximate 2,4: 
directly in terms of the given data. Confining ourselves to linear prediction, 
and making use of n observations, the prediction formula will then be 


vk) ok vk yk) 
(9) — pred. tegn = ag" Hag te Hoag ae + ee ta tn. 
The error of prediction, also called the residual, is denoted 


(10) y Sake = Tt+k — pred. Lt+k « 
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Considering first the functional approach, we apply formula (9) for all ¢, 
thus obtaining the residuals 


(n,k) (n,k) 
* ’ 


(n,k) 
", Yer, Y Yer 5 


In this approach we are led to regard the residual variance, i.e. 
(11) M{| y3" |’, 


as a total measure of the accuracy of the prediction. If we follow the stochastic 
approach, on the other hand, the formula (9) is applied, for fixed ¢, to all realiza- 
tions {x,} of the process {&,}. In this case, the variance expectation, 


(12) E{\ yi" |’, 


is regarded as a total measure of the accuracy of the prediction. The prediction 
coefficients a§”” are determined by minimizing the expressions (11) and (12), 
respectively.” It needs no further comment that the two lines of approach in 
prediction theory will, thanks to the isomorphism indicated, lead to parallel 
results. 

In a study of stationary stochastic processes, the author has earlier found a 
decomposition theorem which has a direct bearing on the prediction problem 
(see [9], theorem 7). The main purpose of the present note is to develop the 
corresponding decomposition for a functional sequence of the type (1). Two 
theorems on this line are given in sections 3-4. The proofs are briefly indicated; 
for further details, the reader is referred to my treatment on the stationary 
process |9]._ In section 5, the decomposition is applied to the prediction problem. 
A few comments follow in section 6. 


3. Auto-regression analysis of stationary time series. Let {x,} be an infinite 
sequence (1) such that the conditions (3) are fulfilled. By (9)—-(10), the resid- 
uals y{"” will be well-defined for every n and ¢. According to elementary 


properties of least square residuals, we have 
(13) Mly<”?] = 0; M [y("” -#4] = Ofork = 1,2, ---,n. 


Since the minimum variance cannot increase if we replace n by n + 1, we further 
have 


M[ | Xt | | > M[ | “ | \ > M [ | — | *) » 0. 
Making n — ©, we infer that there is a constant d° such that 


lim M[|y%""" 7] = da’? > 0. 


? For real sequences {z,} and {é,}, this minimization is, of course, nothing else than the 
method of least squares. 





—— 
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Making use of the Gram-Schmidt orthogonalization procedure, it is further 
possible to show that there exists a sequence {y,} such that 


lim M[\yi" — y. ?] = 0. 


In the usual terminology, the sequence {y,} is the limit in the mean of the se- 
quence {y{"""}, 
n,.0) (n ,0) (n,0) ) __ 


(14) lim. (++: Ye set +O y°** 


hc 


"Yr, Yt, Ys, °°” 


We may remark that (14) does not necessarily imply that y{” will for a fixed 
t have y, for an ordinary limit. We also note that the limiting sequence {y;} 
is not uniquely determined; for instance, the relation (14) remains valid if a 
finite number of the elements y; are modified. 

As is easily shown, we have 


(15) lim Mijy!"” ?] = Mijy P] = Mly-%] = d? > 0, 
and [ef. (13)] 

(16) M{y Xx] = 0, k = 1,2,--- 
Moreover, the sequence {y;} is non-aucocorrelated, i.e. 

(17) My Gi+x] ™ 0, k = +1, +2, ae 


In fact, observing that 


Miy: yrs] = lim M[y-yi2e] k= 1,2,-°-, 
n—eo 

and supposing that (17) is not true, we would have 
(18) | My? -ge] | > a > 0, 
as v runs through some sequence 7, m2, --: , such that n;—> ©. The relation 
(18), however, would imply 
(19) Mi | yl? — cyte |"] < d (1 — 4a’) 
for some sufficiently large v and for some suitable c. Since y!’"” — cy? is a 


linear expression of the type appearing in the right hand member of (9), the 
relation (19) is incompatible with (15). Thus (18) is not possible and (17) must 
hold good. 

Part of the above analysis is summed up in 

THEOREM 1. Given a time series {x} which satisfies (3), let € > 0 be arbitrary. 
Then an integer n and a set of coefficients aS" exist for which (9) defines a residual 
series {y"} such that 


My] = 0, | Miy"-aiR |< e« k= +41,42,---. 
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.4,:A decomposition theorem. We shall first consider the special case where 
(15) gives 


(20) M{|y|*]=a@ =0, 
which is the same as 


lism. (+++ y$ti, y$", ---) = (---, 0,0, ---). 

n—v00 
In this case we shall say that the sequence {2;} is deterministic,* the interpreta- 
tion of this term being as follows: Given the sequence {zx,} for all time points up 
to and including t — 1, we may, by the use of a finite number of the given values, 
predict 2:4, with any accuracy; i.e., with a residual error of arbitrarily small 
variance. This can be shown by induction. In fact, suppose that we are able 
to predict each of x, +--+ , Y:4.-. in such a way that the prediction error has a 
variance < ¢, where e¢ is arbitrarily prescribed. Letting 6 > 0 be arbitrary, we 
can then find a formula of type (9) which predicts 2,4, in terms of the exact 


values 2144-1, Xi+n-2,°°* and which gives a residual variance 6/(k + 1). 
Replacing here 2x,4:-1 , --- , # by values so predicted that the residual variances 
are less than 6/(k + 1) |a{"” |, --- ,6/(k + 1) jaf" |, it is seen that the total 


error of (9) will have a variance < 6. 

We proceed to the general case, d’ > 0. According to the above analysis, 
y. is that part of x, which cannot be linearly predicted from the previous observa- 
tions 2:1, %:-2,-°-:. In other words, each time point ¢ brings in an unpredict- 
able, random-like element y; in the series {x,}. Now while from (16) y; is 
uncorrelated with the previous observations x;; , t:-2 , --* , it will in general be 
correlated with the future observations 2141, ti42,°::. Thus the unpre- 
dictable element y, may be regarded as influencing the future development 
4141, Le42, °*- Of the series {x,}. In order to examine this influence we proceed 
as follows. 

We approximate 2, linearly in terms of y; , yt, +++ , Yen, Writing 


Xe = boys + diye + ore) + OnYien + as” «a + oS”. 
Determining the coefficients b, by minimizing 
M{[| x. — 21” |*), 
the coefficients b, will thanks to (16)-(17) be independent of n. We obtain 
b = 1; by = M[zx1-G 1-2] /d’, k= 1,2,---. 


The sequence {z:”} thus being determined for every n, it is further easily shown 
that {z‘”} converges in the mean, say to {z;}, 


(21) lism. (--- , 242}, 26", ---) = (+++, ea, Sey o°°) 


ruw—w 





3 The term is due to J. Doob [1]; in my study [9] I used the term singular. 
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We may thus write 


2. = Ye + dyin + byie2t+---, 
where the sum converges in the mean. Finally, we write 
(22) Xt =Zz+ Mm, 
which gives a decomposition of the series {z,} into two components {z,} and 
{ur} P 
In the decomposition (22) the component z, is that part of x, which is linearily 
built up by the unpredictable elements {y,} up to and including the time point 


t. From (17) we know that the sequence {y,} is non-autocorrelated. It can 
further be shown that the square modulus sum of the coefficients b; is convergent, 


> ||? < oO, 


As to the component wu; , it can be shown that {u,} is deterministic. More 
precisely, we have 


Liam. fu — (apr? + af” ?wia + +++ Falun} = {0} 


n—-o 


(n,0) 


where the a;"" are the same as the minimizing coefficients of (9). It can further 
be shown that w, is uncorrelated with y.4, and 2:4: for all k, 
M [uiijr+x] = M[uizt+c] = 0, (k = 0, +1, +2, lla -). 


Summing up the above results, we obtain 


THEOREM 2. Any time series {x,} which satisfies the conditions (3) allows the 
decomposition 


(23) {te} = {ae + ui}, 
with 
f 


{2} = Liam. {ye + diya + beyr-e + +++ + bayn}, 


nO 


where the series {y:}, {2} and {u.} have the following properties. 
A. The elements y: , 2: and u, are obtained from x:, X1-1,°-- by the limit for- 
mulae (14), (21) and (22). 


B. The series {y.} has zero mean, 
Mly) = 0, 
is non-autocorrelated, 
Myre] = 9, kK = +1, +2,:--, 
and is uncorrelated with {a1}, {x12}, - 


My. X-«] = 0, k= Ry 2, ia 
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C. The series {u,} is uncorrelated with {y.} and {z:}, 
M|ugiri] = M[u2e+s] = 0, (k = 0, +1, +2, ---). 


D. The series {uz} is deterministic. 


5. Application to the problem of prediction. In section 1 we have considered 
‘the problem of predicting 2, linearly in terms of 2:1, t:2,--:. Now it is 
seen that theorem 2 gives the following formula for predicting z;; with an error 
of minimal variance, 


pred. Leak = User + bea e—1 + bx -2Y t—2 + ++ 


In fact, by theorem 2, A and D, the right-hand member can be calculated with 
any prescribed accuracy from a finite set of observations 2+1, 1-2, -** »Zt-w, 
where N of course depends on the accuracy desired; on the other hand, the 
prediction error being 


Yere + OvYtyea +--+: + diye, 


we infer from theorem 2 (B) that this error is of minimal variance, 


M[| 24x — pred tix |7] = (1+ li |? + ++» + | be |e. 


6. Comments. As mentioned in section 2, the above theorem 2 is the analogue 
of a theorem on the decomposition of a stationary stochastic process given by the 
author previously (see [9], theorem 7). The starting point is then to apply 
formula (9), not as above to the same sequence {z,} for varying ¢, but to all 
realizations {x,} of the process, holding ¢ fixed. The close connection between 
the decomposition in the two approaches is further brought out by the following 
theorem. 

THEOREM 3. Given a stochastic process, 


+, &(¢ 7 1), E(t), &(t + 1), re 


which is stationary in the sense of (5), let {x.} be an individual realization of this 
process. Then {2,} will with probability 1 allow the decomposition of theorem 2. 

In fact, according to the ergodic theorem of Birkhoff-Khintchine,‘ the averages 
(2) will exist with probability 1, and so theorem 3 follows from theorem 2. It 
should be observed that the coefficients 6, will in general vary from one realiza- 
tion to another. , 

The theory of the decomposition (23) has been carried further in a brilliant 
study by A. Kolmogoroff [3]. His analysis deals with the general case of a 
stationary sequence ina Hilbertspace. Establishing a decomposition of type (23) 





*See A. Kolmogoroff [2]. His proof refers to averages (2) of the special type where 
t, is hold fixed while tg + ». According to the stationarity, however, the average exists, 
and is the same, when fg is fixed and t; ~ —«,ands_ the general average (2) will likewise 
exist. 


ee 
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for such sequences Kolmogoroff also shows that the decomposition is uniquely 

determined by properties corresponding to A-D. Making use of the powerful 
methods of spectral analysis of linear transformations in Hilbert space, Kolmo- 
| goroff further presents a highly developed theory of the decomposition. 

As immediate corollaries of this general theory Kolmogoroff [4] obtains corre- 
sponding results for astationary stochastic process {£,} such as (4). Now thanks 
| to our lemma in section 2, similar theorems hold good for the functional sequence 
| (1). These results include detailed theorems on the connection between the 

decomposition (23) and, on the other hand, the function F (A) which by. (6) 
generates the coefficients 7,. For example, it turns out that {z,} is completely 
deterministic if the derivative F’(A) is constant over an interval of positive 
measure. An xplicit formula for the coefficients b; in terms of the function 
F(A) may also be obtained. For proofs and further results, we must refer to 
Kolmogoroff’s papers [3]-[4}. 

The theory of the decomposition (23) has later been generalized in various 
directions. V.Zasuhin [li] and J. Doob [1] have shown that the decomposition 
applies to multi-dimensional stationary sequences. As shown by the present 
author [10], the decomposition may be employed for the analysis of linear equa- 
tion systems with an infinite number of unknowns. This device makes use of 
the decomposition of non-stationary sequences, a generalization indicated also 
by M. Loéve [5]. 


a 
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GENERALIZATION TO N DIMENSIONS OF INEQUALITIES OF 
THE TCHEBYCHEFF TYPE 


By Burron H. Camp 


Wesleyan University 


1. Summary. The Tchebycheff statistical inequality and its generalizations 
are further generalized so as to apply equally well to n-dimensional probability 
distributions. Comparisons may be made with other generalizations [1], [2] 
that have been developed recently for the two-dimensional case. The inequal- 
ities given in this paper are generally as close as the most favorable corresponding 
inequalities that exist for the one-dimensional case and in many simple cases 
they are closer than those that have been given heretofore for two dimensions. 
In a special case the upper bound of our inequality is actually attained. The 
theory contains also a less important generalization in one dimension. 


2. Introduction. It is necessary to introduce a new kind of moment, to be 
called a ‘‘contour” moment, which is a generalization of the usual one-dimensional 
moment. If we consider first a simple two-dimensional frequency surface, 
y = f(t, te), we may think of y as a function of a single variable, z, where z is the 
area of the contour on that surface at the y level. This function may be defined 
so that it is monotonic decreasing and has other simple characteristics. Then 
we define the rth contour moment as 


@ 
i, = [ ay dx, 
0 


and then the generalization of the Tchebycheff-type inequalities follows easily. 
This theory can be applied equally well to almost any single-valued function of 
n variables which is limited and integrable in the sense of Lebesgue. Therefore 
the theory will be enunciated initially in a very general form. The reasons for 
the initial statements will be indicated only briefly because a detailed discussion 
of quite similar ideas has been given by this author in another paper [3], where 
he applied the same general principle to obtain generalizations of certain theo- 
rems in integration theory. 


3. Preliminary theory. Let f(t;,---, ¢.) be a probability distribution with 
limited upper bound L and defined at all points of infinite n-space, which is to be 
denoted by 7, dT being the Lebesgue measure of a differential element. We 
thus assume that:0 S f(t, --- ,t.) S L,f has a Lebesgue integral in 7, and 


| faT =1. 
; 


Let Q, denote the set of points in T where f > A, (O S A S L), and let xz be the 
568 
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measure of Q, , for Q, is known to be measurable. Therefore x, = 0,% S @, 
and for each ) there exists a unique Q) and therefore a unique x. This means 
that x is a single-valued function of \ and that it exists (or is positive infinite) 
for every value of \ in the interval (OSA SL). If’ >A, xm Sx. This 
means that x isa monotonic decreasing function of A. It need not be continuous; 
that is, it may be asymptotic to the line \ = 0, and it may have finite discon- 
tinuities or “‘jumps’’. Also there may be an enumerably infinite number of A 
intervals in which x is constant. It follows that \ is a monotonic decreasing 
function of x in the interval (0 S x S 2% S ~), but it may not e cst (in intervals 
where z has jumps), and it may be multiple valued (at points where z is constant). 
We now let y(x) = A., except that: if \ is multiple valued at any point x we 
let y have the minimum value of \ at that point. Any other value would do 
equally well because the total measure of such points is zero and they can be left 
out of the integrals that follow. If \ does not exist in an z interval, we let y have 
in that interval the value which it has at the beginning of the interval. This isa 
point where x has a jump. We have thus defined y as a single valued mono- 
tonic decreasing function of x in the interval (O SxS aS ~)andO Sy SL. 
It follows from Lebesgue’s theory that: 


Zp ro 
| y(a) dx = / faT,O0 <r SL); | y(x) dx = / fdT = 1. 
0 Qr 0 T 


Finally we restrict our function f so that there shall be at most a finite number 
of points x where \ is multiple valued (intervals of \ over which z is constant), 
and hence the number of discontinuities of y will be finite. This restriction may 
not be necessary but it is convenient and not embarrassing in applications. 


4. Contour moments. The rth contour moment is denoted by f,. The con- 
tour standard deviation is denoted by ¢. We define 


ir = [ vy dx. 
0 
It follows that uw = 1, and that 


ery . 
jo =o = | xy dx. 
0 


We shall also let Go, = f2,/¢". We now assume that 7 is either zero or a positive 
integer, but in much of what follows this assumption is not necessary. 


. —1 —(t2-4-42)/2 _ : 
Example 1. Let f(t, f) = (2r)7e “"*”. The equation, f(t, b) = A, 
defines a circular contour whose area is x = a(t; + f2) = —2x log 27. Hence 
a — (Qe\—1,~2/28 
y =r = (2x) ¢ , and 


iy = [ ay dx = (2r)'r!, 6 = 8m’, deo, = (2r)!/2". 
0 


5. Contour moments and one-dimensional moments. Ifn = 1 and if f(4) = 
f(-—h), then 


19 


[at ydx =2[ 0s) dt = wy 2, 
0 


“0 


‘es 
to 
% 

I 





570 BURTON H. CAMP 


where yw, is an ordinary moment. Hence also ¢ = 20, &, = fx,/G" = pa, 2”/ 
o” -2" = a,. It is to be noticed that, although a2, = ae, , for ¥ Mer. One 
could alter the definition so that these two moments would be equal by inserting 
into the definition of contour moments the factor 2”, using x/2” in place of x, 
but this would introduce a slight complication for a doubtful advantage. Al- 
though it would seem to be desirable to define the even contour moments ji, 
so that they would become the ordinary moments ue, in the symmetrical one- 
dimensional case, such a definition would not make the two corresponding odd 
moments equal, and it would not make the two even moments equal in the non- 
symmetrical one-dimensional case. So it seems better not to introduce this 
factor 2”, but to take note of the relationships that hold in the one-dimensional 
case. 

THeEorREM. Let 


where d is such that x, = 6¢. Then 


+A ier / (8 ; att) ; 
2r 


Coroutiary 1. In particular 1 — Ps; S &,/6". 

Corotiary 2. Jf r = 1,1 — Ps < 4/95. This theorem and these two 
corollaries are minor generalizations even of the corresponding one-dimensional 
inequalities, for it is no longer assumed that the probability distribution f(t) 
has but one mode. 

Proor oF THEOREM. Let g(x) = y(x) if0 Sx Sm S ~, let g(x) = y(—2z) 
if—«o < —xz% Sx SO, and let g(x) = O elsewhere in (—~#, ~). Then g(z) 
has all the properties explicitly required of f(x) in a former paper by this author 
[4] in which this theorem was proved for the one-dimensional case. That is: 
g(x) is a frequency function whose mean is zero, and 


[ g(x) de = 2, and [ g(x) dx 
a be 


is the probability that | z| > 6c; g(x) is a monotonic decreasing function of 
| x | for all values of x; and is symmetrical with respect to the central ordinate. 
Therefore, transforming the symbols of that paper to our present notation, we 
have 


a 


Qr + 1\" 
Gx a = Qr ’ ; 
L g(x) dx S ae AC "= ) 


where 


@o @ 
2 2 2 n2 
o = [ xgdx = | xydr=ce. 
0 0 
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Similarly yo, = for , G2, = Qe, , and finally 


5é bo 2 
1-P=1-f[ ydr=1-[ gaze =|[ gar 
0 0 bc 


= cw, / (3 ° tty = és, / (3 ° ett ; 


This proves the theorem except that there is one exceptional case that requires 
attention. In the proof of the theorem in the paper just referred to the author 
assumed that the function corresponding to our present g(x) was continuous. 
At that time a “frequency”’ function was often thought of as determined by a 
smooth curve approximating a histogram and implied even the existence of 
derivatives, and so continuity was not added to the explicit requirement that 
the function be a “frequency” function, but this condition was explicitly intro- 
duced in the lemma on which the proof of the theorem was based, and so we do 
now have to consider separately the case where y, and hence g, may have a finite 
number of jumps. It is quite easy to handle this case as the limiting form of a 
continuous case. In that lemma it was also required that d’Q/dt’ should exist 
and be non-negative, which would imply that we now have to make the require- 
ment that y (corresponding to dQ/dt) shall have a non-negative first derivative. 
On examination of the proof, however, it will be observed that this is not neces- 
sary, since y is monotonic decreasing and continuous. That is, in the lemma the 
only use made of the condition, d’Q/dt’ = 0, was that the function Q(t) should 
determine a curve which would be never concave down. But for this it is 
sufficient that dQ/dt be continuous and monotonic increasing, and these condi- 
tions are now satisfied by the function which plays the réle of Q in the present 
discussion. This function will now be defined as 


[ (x) dx. 


Let y(x) be a continuous function defined as equal to g(x) except in the neighbor- 
hood of the points of finite discontinuity. Near such points it is to be so de- 
fined that it shall have all the properties just required of g(x), and in addition 
so that, for any prescribed R > 1 and e > 0, 


[ a” (x) dx = [ a” g(x) dz + ,, (1<r<R); 
0 0 


[ y(x) dz = [ g(x) dx + , 


where | 7, 7, | < ¢. It is obvious that such a definition of y may be made in 
many ways, and one of them is by making use of a linear function in the neigh- 
borhood of each point of discontinuity. Since y(x) now satisfies all the condi- 
tions of the author’s earlier paper the corresponding inequality is true: 


(/ vde) (s . ett) ([ xy iz) < | ay dz, 
bc, 2r 0 0 
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where 
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a1 = [ xy dx. 
0 


. [_— 2r+1\" 22 ee ee th 
CR: ") ¢ i) (6° — m)’ 3 fier — te. 


Let € approach zero and we have, as desired: 


i as Py < a>, / (s 4 2r it). 
2r 


Hence 


EXAMPLE 2. Let 


, ( 1ft t*, _ Fe 
fla, ae » hel = A exp) —3(4 4 e 64 + pe A = (27) “(o,-°° On) . 
\ Net ons ) 

This is a form into which the general correlation solid may be put by means of a 
linear transformation. Since P; is a ratio between two parts of such a solid and 
since this ratio is preserved under a linear transformation, the more general case 
may be transformed into this one, or even, as will appear shortly, into the simpler 
one where all the standard deviations are unity. If f = \ the contour is the 
ellipsoid, 

ti ti N 

t+: +3 = -2log—. 

oi on A 
The volume of this ellipsoid is 


on” /2 


ie _ »/A)"? = ' she , Vo = ; 
x = h(—2 log y/A)"", h = Voor ~ nI'(n/2) 


@ 
—}( Th 2/n ns r 
Hence y = Ae”, iy = [ xy dx 
0 


nr +7 ni2oni2+l a 2 : _ 
ahi rr Te (* r#) - (* a1 ee) 


[P(m/2)~7 * 


2 


a 


Putting r = 2 we obtain 


2 _ '2""(a, +++ on) P(8n/2) 
. n? [VP (n/2)}*’ 
and then 
r 2rn + n 
ie 2 I(n/2) | 
“er BS O(n/2) 1 (3n/2) |" 


( 


T 





Ju 
J « 
y! 
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| Our inequality becomes: 1 — P; < J, where 
. Q2r h: pe a 
J= ( Or + iy , or 1, whichever is smaller. 
Dy Tae 
2r 
Typical numerical values of @, and of J are given in Tables I and II. 
TABLE I 
Values of Go, 
2 —"* ee — “iy 
n | Q2r | a2 | as as 
-—-| ——___________ -| —| 
4 1-3---(2r — 1) | 1 3 | 2B 
2 | (2r) 1/2" | 1 6 | 90 
3 3-5-7---(6r + 1)/(3-5-7)" | 1 12.26 | 566 
+ | (4r + 1)!/(5!)" | 1 25.20 | 3604 














a 
| 
| 

e | © | 
| 
| 
| 
3 
| 
| 7 
! 
Oy 





1 1 1 | 0.444 
2 1.000 

1 2 1 0.444 
2 1.000 

2 1 1 | 0.111 
2 | 0.077 

3 | 0.093 

3 1 1 | 0.049 
2 0.015 

3 0.008 

4 0.006 

5 0.006 

3 2 0.049 
0.030 

| 3 0.049 

3 3 1 0.049 
| 2 0.062 

| 3 0.308 
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Let us now compare J with the true value of (1 — Ps) in one of these cases, 
viz., when 6 = 3 and n = 3. The true value is given by 


36 
—l(rJh)2/3 
iota € o 4 | en MeI3 ay 
0 


where now &@ = 4m +/105(010203)/3, h = 42(010203)/3. The integral may be 
evaluated by means of the transformation, t = (z/h)'* and a table of the integral 
of (Qn) te P2(¢ *— 1). We obtain: 1 — P; = 0.0205. This is the true value 
to be compared with the approximation, J = 0.049. The closeness of this 
approximation is similar to that which may be obtained for the normal law by 
using the corresponding inequalities for one dimension. To illustrate this we 
find from the usual tables that, if for the normal law 1 — P; = 0.0205, 6 = 2.32. 
Hence the corresponding inequality is (for r = 2): 1 — Ps S 0.042. 

We shall now show that the upper bound of our inequality is actually attained 
in aspecial case. Let f(ti:,---,t.) = 2 "in the region (—1 St4,,--- ,t, S 1), 
and let f = O elsewhere. For this case we shall have x = 0 when \ = 2 ", and 
x = 2” when0 SX <2”. Thereforey = 2” if 0 S x < 2", andy = 0 
if 2" < x. Hence é = 2”/+/3, mo = 1, and the true value of (1 — Ps) is 
1— 6/+/3; and when 6 = 2/+/3, this true value is 1/3. The appropriate in- 
equality is: 1 — Ps < 4/9 6, and when 6 = 2/+/3 the right hand side of this 
inequality is also equal to 1/3. These relationships are true for all values of n. 
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BOUNDARIES OF MINIMUM SIZE IN BINOMIAL SAMPLING 


By R. L. PLACKETT 
University of Liverpool 


1. Introduction. Much attention has recently been concentrated on the prob- 
lems arising when sampling a binomial population, since this is thought to form a 
suitable model for certain industrial and biological procedures. A general 
discussion of such procedures as applied in industry has been given by Barnard 
[2] and various particular cases have received detailed treatment by Burman [3] 
Stockman and Armitage [6], and Anscombe [1]. Unbiased estimation of the 
population parameter (the “fraction defective’) has been investigated by 
Girshick, Mosteller and Savage [4] and Wolfowitz [7]. A paper by Haldane [5] 
is also relevant. 

For such sampling procedures it is necessary to find the probabilities of accept- 
ing or rejecting material with a particular fraction defective; to calculate the 
average sample size; and to form an estimate of the fraction defective when 
sampling terminates. All three characteristics may be expressed in terms of 
quantities N(x, y), defined in section 3, so that once these are known, the funda- 
mental properties of the scheme are known. 

Here we present a mothod for determining the N(x, y); investigate the condi- 
tions under which it is valid; relate the method to the estimation problem; and 
exemplify its application. The schemes to which the method can successfully 
be applied are of a special type (to which the title refers) and include all inspec- 
tion procedures with a finite upper limit to the sample size likely to be used in 
practice. Other schemes, when dissected in a manner similar to that used by 
Stockman and Armitage, can doubtless be formulated as an aggregate of the 
special types. 


2. Nomenclature. Our nomenclature differs in some respects from that of 
Girshick, Mosteller and Savage, although the same collection of terms is em- 
ployed. References to their paper should therefore be followed by a comparison 
of the terminology. 

Taking a sample of one from a binomial population consists in observing either 
of two events, whose probabilities are p and 1 — p(p ¥ Oorl). The results 
of successive samples of one can be represented by the path of a particle in a two- 
dimensional lattice of points with non-negative integer co-ordinates. This 
particle starts at the origin 0 and at any point (a, y) travels to (x + 1, y) if the 
event whose probability is p has occurred, otherwise to (x, y + 1). Sampling 
terminates when the particle reaches a boundary point, and the set of such 
points is denoted by B. Any point which can be reached during sampling, 
including the boundary points, is accessible, and any path from the origin to a 
point B which can be traversed during sampling is admissible; all other points 
are inaccessible and all other paths znadmissible. The index of a point is the sum 
of its coordinates. 
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It will probably help to note in particular that whereas Girshick, Mosteller and 
Savage used p to correspond to events causing the y co-ordinate to increase, we 
use it for x. 


3. Determination of N(x, y). The set B determines the sampling scheme and 
we are concerned with schemes in which all points of index greater than n, 
the finite maximum index of points in B, are inaccessible. This condition guaran- 
tees that if N(x, y) denotes the number of admissible paths from the origin to 
a point (xz, y) of B 


x N(x, y)p"(1 — p)” = 1 


the summation being over all boundary points. Consequently, to determine 
N(z, y) equate coefficients of p in this identity, the coefficient of p° in the left 
hand side being 1 and all others zero. When all the N(z, y) are known, the 
probability of reaching any subset of B can be calculated and the characteristics 
of the scheme found. 

Sometimes it will be convenient to use 


X N(a, y)q’(1 — 9)? = 


where g = 1 — p, but the resulting set of equations cannot be independent of the 
first set since if 


> asp’ = Dot - py, 


-X(- v(2)5,, 


The polynomial in either p or q is of degree n; the application of this method 
alone is therefore limited to boundaries containing at most (n + 1) points, other- 
wise the number of unknowns exceeds the number of equations for them. 


then 


4. Properties of the boundary. 

THEOREM 1. Jf n is the maximum index of points in B and if any point of 
greater index is inaccessible, then B contains at least n + 1 points. 

There must be at least two boundary points of index n for any such point 
(an, bn) must be approached from (a, — 1, bn) or (@n, bn — 1); in which case 
either (a, — 1,b. + 1) or (an + 1, b2 — 1) isa boundary point. Let P be any 
one of these points. At least one admissible path exists from 0 to P; suppose 
one such path to consist of the points (ao, bo), (a1, bi) , --: , (Gn, bn) where 
a, +b, =k(k =0,1,2,---,n). Itisclear that one or more boundary points exist 
on the line x = a; , having y > b, , for otherwise the particle could travel indefi- 
nitely along this line; similarly one or more exist on y = b, with x > a, ; and if 
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there is just one on each they cannot be identical unless k = n since (a; , bx) is 
not then a boundary point. Initially (ao , bo) contributes two boundary points; 
since then either ay4; = a, and bys; ¥ dy OF Qes1 % ay and dyir = dy it follows 
that each succeeding point up to and including (an: , ba-1) contributes at least 
one more; the point (a, , b,) is counted as soon as x reaches a, or y reaches b, , 
whichever occurs first. Consequently there are at least n + 1 boundary points. 

Reversely, if the boundary contains n + 1 points whose maximum index is 
m, such that any point of greater index is inaccessible, then m < n. For suppose 
m > n and apply the preceding result. 

An important class of boundaries therefore comprises those with the minimum 
number of points necessary to attain a given maximum index; they may con- 
veniently be termed boundaries of minimum size and for them alone the method 
of equating coefficients yields the number of equations equal to the number of 
unknowns, the first being otherwise less than the second. 

If there are exactly n + 1 boundary points then (a;, bi), (a2, be), «++, (€n—1, bar) 
must each contribute to just one; since ai4; = a, or a, + 1 there is one 


_ point of B on each of the lines x = ‘0, x = 1, --- , x = a, and this set of points 


(0, do)(1, di), --- , (@n , bn) can be denoted by U, the upper part of the boundary. 
Clearly dii1 > d, — 1 for otherwise more than one boundary point is required 
on the line x = kK + 1. Similarly, there must be a second group of points of B 
(co, 0), (c1, 1), --+ 5 (Qn, bn) With c.41 > ce — 1 forming the lower boundary L; 
and all (n + 1) points have now been enumerated, the point P belonging to both 
Uand L. The characteristic of such sets B is that the sequences U and L both 
have monotonically non-decreasing index; the special case of sequences with 
monotonically increasing index provides the rejection and acceptance boundaries 
of non-rectifying industrial inspection procedures. (The difference between 
rectifying and non-rectifying procedures is clearly stated in the introduction to 
Anscombe [1]). 

THEOREM 2. For boundaries of minimum size any two accessible points not in B 
of the same index m cannot be separated on the line x + y = m by boundary or in- 
accessible points In the terminology of Girshick, Mosteller and Savage the 
accessible points not in B form a simple region. 

Let Q(a , y1) and R(x , y2) be any two such accessible points of index m and 
suppose 21 < 22. There are two possibilities: (am , bm) does or does not lie be- 
tween Q and R. 

(i) (dm, bm) lies between Q and R, i.e. 7; < Qm < 22. In this case there must 
be points of B at Q’(a , Y1) with Y1 > y: and at R’(X2, yz) with X_ > 22. The 
boundary from Q’ to P and from R’ to P has non-decreasing index; hence all 
points of U on the lines zt = 1,2 = % + 1,-+-,2% = Gm — 1 have index at 
least x: + Y: > m;similarly all points of Z on the linesy = y2,y = y2 + 1,---, 
y = bm — 1 have index at least X2 + yz > m. By definition of the boundary 
there are no additional points of B on either group of lines between the path 0P 
and the line zx + y = n, so the proof of the theorem is completed. 

(ii) If z; > Gm Or 2 < Gm the proof is precisely analogous to that given in (i). 
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5. Justification of the method. THrorrem 3. For boundaries of minimum size 
the equations for N(x, y) are soluble and of rank n + 1. 

To prove this we give a general method of solution for the system of equations, 
using powers of p and q alternately: as already remarked, this is equivalent to 
using the equations from the coefficients of powers of p only. In the first place, 
note that the coefficient of g“ is a linear combination of numbers N(z,y) with 
x+y > uandy < u;and the coefficient of p' hasx + y > tandz < t. 


Let s = Min(dy , di , dz, «++ , Dn) — 1. 


Then from the coefficients of q’, q',---, g° can successively be determined 
N (co ,0) N(c1, 1), --- , N(cs , 8), the matrix of the equations being triangular with 
ones in the main diagonal. The points in U at (71,8 + 1), (r2,s +1), --- now 


appear in the coefficients of g°™’, q°*’, --- and complicate the solution. 


Let r = Max(n, 72, °-:). 


If either (r, d-) or (c,, s) is the point P then all the remaining N(z, y) can 
successively be determined from the coefficients of powers of p when the values 
of N(co , 0), N(cr, 1), --- , N(es , s) are substituted in the equations. Otherwise 
the path OP for y > s + 1 must have x > r + 1 so that all points of L on y > 
s+ 1havez >r-+ 2ie. any point of L onz = 0,2 = 1,---, 2 =r has 
y < s;for such points the number of admissible paths is now known. Therefore 
from the coefficients of p’, p', --- , p’ can successively be determined N(0, do), 
N(1, di), --- , N(r, d-), the matrix of these unknowns being again triangular; 
in particular V(r, s + 1), N(r2,s + 1), «++ can now be found. 

Let s; = Min (d,41, dr42, +++ , bn) — 1, so that s; > s. The coefficients 
of g’’, g’’, --- , g give successively N(¢.41, 8 + 1) N(cs42, 8 + 2), °°: , 
N (ca, 81); for the points in U at (rn, 8:1 + 1), (12,81 +1) ---. Let 


ri = Max (rn, %2,°-:). 


Since there is only one point of U on each line x = constant, 7, > r. As 
before, if either (r: , d,,) or (Cs,, 81) is P the remaining points of U are soon deter- 
mined. Otherwise the process continues and there result an increasing sequence 
of points of Z and a similar sequence for U; the process terminates when 
(an , bn) has been reached in both, when all N (a, y) will have been found. 

It is clear that for particular cases alternative methods of solution will prove 
more convenient. 


6. Connection with estimation. Suppose that the point (¢, «) is accessible and 
let N*(x, y) be the number of admissible paths from (t, w) to (x, y) where (a, y) 
isin B. Then Girshick, Mosteller and Savage have shown that N*(z, y)/N(a, y) 
is an unbiased estimate of p‘(1 — p)”; and a necessary and sufficient condition 
for it to be the unique unbiased estimate is that the accessible points not in B 
form a simple finite region. Hence from theorem 2 such estimates are unique 
for schemes with boundaries of minimum size. An alternative proof is given by 


rm  —e—" 
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considering that if two unbiased estimates of any function of p exist and f(x, 7) 
is the difference between them at (2, y) 

> f(z, y)N(a, yp" — p)” = 0, 

B 
where f(x, y) is not everywhere zero. The equations formed by equating coeffi- 
cients have rank (n + 1) as shown by Theorem 3, so that the only solution is 
f(x, y)N (a, y) = 0. Since each N(z, y) is certainly positive it follows at once that 
f(x, y) = Oand there can only be one unbiased estimate. 


7. An illustration. As an application of the method we take the interesting 
rectifying sequential inspection scheme discussed by Anscombe. The boundary 
points are at (H,0), (H + b, 1), --- (H + ub, uw), where uz is the greatest integer 
less than (NV — H)/(b + 1), and thereafter on the line x + y = N. The equa- 
tions for N(x, y) take here their simplest form, namely equation (4) of Barnard’s 
paper. From the coefficients of q’, q',--:,q",°°:, 


1 = N(A, 0); 


0 = N(H +}, 1) — HN(H,0) whence N(H + 5),1) = H; 
0 = N(A + 2b, 2) - (" : ’) H+ e whence N(H + 2b, 2) 
_A(H+2%+1. 
2! , 
0 = N(H + 36.3) — P : *”) H(H +? +1) _ ¢ : " H+ (7); 


whence N(H + 3b, 3) = H(H + 3b + — 3b +1) . 


It now appears reasonable to guess the general term as 


A + yb + y — 1H + yb + y — 2)--- (A + yb + Dd. 
y! 


The proof is therefore complete if we show 


.) E + °) ‘ + “§ H(H + 2b + 1) 
ane H + eidalcabaseipedta Ethene 
y y—1 es 2! 





3! 


_ . + “ H(H + 3b + 2)(H + 3b + 1) 
y—3 


+++ + (-1) 


H(H + yb + y — 1)(H + yb +y — 2)--- (A +yb +1) _ 
y! 


0). 
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Put (b + 1) = &, and the left hand side becomes 


(7 — 1)! (i + § — 1)! (i + 2 -— 1)! 


(H— yy! H+e-yig— Di! H+ B— Vig — De! 
in vie ey (H + yé — 1)! 


(H + yé — y) iy!’ 
which is y times the coefficient of t” ” in (1 + #)”*" x [1 +0% — t}’. 
Rewriting the latter as (1 + #)”"|1 — (1 + ¢')']", it becomes clear that the 
highest power of t is (“-”", whence the required result follows. 
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NOTES 
This section is devoted to brief research and expository articles and other short items. 
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NON-PARAMETRIC TOLERANCE LIMITS! 
By R. B. Murpuy 


Princeton University 


1. Summary. In this note are presented graphs of minimum probable popu- 
lation coverage by sample blocks determined by the order statistics of a sample 
from a population with a continuous but unknown cumulative distribution func- 
tion (e.d.f.). The graphs are constructed for the three tolerance levels .90, 
.95, and .99. The number, m, of blocks excluded from the tolerance region runs 
as follows: m = 1(1)6(2)10(5)30(10)60(20) 100, and the sample size, n, runs from 
m to 500. 

Thus the curves show the solution, 6, of the equation 1 — a = 
I;3(n — m + 1, m) for a = .90, .95, .99 over the range of n and m given above, 
where J,(p,q) is Pearson’s notation for the incomplete beta function. 

Examples are cited below for the one- and two-variate cases. Finally, the 
exact and approximate formulae used in computations for these graphs are given. 


2. Introduction. Suppose a sample of size n is drawn from a population hav- 
ing a continuous cumulative distribution function (¢.d.f.), F(x). Let the sample 
values arranged in order of increasing magnitude be 2, %2,-°--,2%n,. The frac- 
tion, u, of the population which is included between 2, (the r-th smallest value 
in the sample) and an_.41 (the s-th largest value) is F(tn-s41) — F(z,). This 
quantity uw has been called the population coverage for the interval (2, , %n~s41). 
The probability element for this coverage is 
(2.1) f(u)du = — rn + 1) : u* “(1 — wu)” du 

, rin — m+ 1) (m) 
wherem =r-+s. From (2.1) we can calculate the probability that this coverage 
is at least a given amount, say 8. If we call this probability a, we have 


(2.2) a= J f(u) du. 


The quantity a is the probability that 1008% of the population will be included 
between z, and Xn_s41, and it is called the tolerance level. This probability de- 
pends only on n and m (=r + s). 

1 All computations involved in this paper were carried out under an Office of Naval Re- 
search contract. 
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The idea of coverage is more general than it first appears. If we think of 
t1, %,°**, %, as points plotted along the x-axis, we will then have n + 1 
intervals: (— ©, 21), (a1, 22), +--+ , (a2, + ©), which, following Tukey [3], we will 
call blocks. The reason for this term will be clear when we deal with the case of a 
sample from a population of more than one variable. The coverage for the 7-th 
block (a; , 2:41) is F(xi41) — F(x:). The probability element of the sum of the 
coverages of any preassigned group of n — m + 1 blocks is given by (2.1) and 
hence the probability a that the fraction of the population covered by any 
n — m+ 1 blocks is given by (2.2). By preassigned blocks we mean ones desig- 
nated by order statistics prior to obtaining any sample from which a prediction is 
to be made with these blocks. In general it is not legitimate, after taking a sample 
and for some reason evident only then, to specify which blocks in this sample are 
to be included or excluded from the coverage. There is no objection, however, 
to specifying a scheme of blocks for the coverage on the basis of past samples 
when the scheme is to be applied to future samples. 

The purpose of this note is to present graphs of 8 as a function of n for m = 
1(1)6(2)10(5)30(10)60(20)100 and for a = .90, .95, .99. There are three figures: 
Figure 1 gives curves for a = .90, Figure 2 for a = .95, and Figure 3 for a = 
.99. The graphs are accurate to at least two decimal places but never more than 
three. In terms of the Pearson notation (2.2) gives, after minor alternation, 
1— a= 13 (n — m+ 1, m). Hence these graphs may also be used to find 
the 10, 5 and 1 per cent points of a variate X (0 < X < 1) with the c.df. I,(p, q) 
for 1 < p < 500 and 1 < gq < 100. 


3. Computations for the graphs. If in the relation (2.2) three of the argu- 
ments a, 8, m, and n are given, the solution for the fourth may often be found 
in Pearson [5] or Thompson [6]. The values of 6 through n = 100 were com- 
puted exactly for these graphs. For larger n, 8 was computed approximately 
from 


(3.1) = 


where x7, is determined by the relation 
Pr(x’ = x2) =1l—a« 


and has 2m degrees of freedom. This approximation is due to Scheffé and Tukey. 
° . ° ° 2 
For large m the Cornish-Fisher approximation to x, was used. 


4. Illustrations of the one-variate case. The most common use to which 
the graphs presented here may be put is in the prediction of 8 in sampling from 
a distribution of a single random variable. It is this case that was first presented 
by Wilks [1]. Suppose in the mass production of a certain type of screw one is 
interested in the least proportion of all screws manufactured that have lengths 
between the least and greatest lengths appearing in a random sample of 100 
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Fia. 1. Graphs of Population Coverage for the Tolerance Level .90. 
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2. Graphs of Population Coverage for the Tolerance Level .95. 
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Fic. 3. Graphs of Population Coverage for the Tolerance Level .99. 
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screws. It is assumed that we do not know the distribution of the length, Y, 
of a screw produced in this process. Furthermore, it is assumed, of course, that 
the manufacturing process is in a state of statistical control in the sense of 
Shewhart. We plan to discard two blocks: (— ©, x) and (2x99 , + © )—exactly 
as many blocks as observations. At the level a = .99 we obtain from Figure 3 
that at least 93.5% of all screws in the population sampled have lengths that fall 
between x; and 20. If we now draw a random sample of 100 screws and find 
the least and greatest screw lengths to be 1.40 and 1.60 inches respectively, we 
may say that at least 93.5% of all screws from the population sampled have 
lengths between 1.40 and 1.60 inches at the .99 tolerance level. It must be 
observed that the prediction is made on the basis of preassigned order statistics, 
and not of the values 1.40 and 1.60. 

We might equally as well have put the question in another way: If we want 
at least 93.5% of the lengths of all screws to lie within the range of lengths of a 
sample of 100 screws, then at the tolerance level a = .99 what is the smallest 
sample we could have in which as many as 2% of the sample are not acceptable? 
Examining the intersections of the curves in Figure 3 with the line 8 = .935 we 
choose the smallest n such that m/n < .02 and find n = 100. 


5. The case of more than one variate. The ideas given in the introduction 
may be extended to sampling situations involving two or more statistically 
dependent variates with a continuous joint c.d.f. by means of the notion of blocks. 
The abstract formulation is given by Tukey [3]. We shall restrict ourselves to 
the case of two dependent variates X and Y, but the generalization is obvious. 
Because of the dependence, the joint population of X and Y may be expressed 
as an associated pair of values W = (X,Y). Suppose a sample of size n is drawn 
from this population, and let the pairs be w; , we, --- , w, , Where w; = (a; , y:). 
If we now choose a sequence of n numerically valued functions of x and y (or of w), 
filw), «++ , fa(w), let us order the w; in a sequence wy”, ws, --- , wS? such that 
fi(wi:) > fi(wt”). Imagine now that the sample values are plotted in a plane 
scatter diagram. We call the first block the set of points w = (z, y) such that 
fi(w) < fi(w{”). That is, we may imagine the curve fi(z, y) — fi(wt”) = 0 
plotted in the plane and that the first block is bounded by this curve. Then 
discarding w{” we take the n — 1 remaining w; and order them in a sequence 
wy”, wh”, --- , w, such that fo(wS7) > fo(w!”). We call the second block the 
set of points w = (z, y) such that f\(w) > fi(wi”) and also fo(w) < fo(wi’’). 
Thus the second block is bounded by the curves fi(z, y) — fi(w:”) = 0 and 
fo(a, y) — fo(wy”) = 0. If we continue this process of discarding and reordering, 
until all x functions f; are used, we shall obtain a division of the plane into 
n + 1 non-overlapping blocks, the “extra’”’ block arising at the last step in the 
process. Then the fraction, u, of “points” (X, Y) of the joint population of 
X and Y that are covered by any n — m + 1 blocks has the probability element 
(2.1). Also the probability a that the population coverage, u, will be at least as 
large as B is given by (2.2). Then — m + 1 blocks constitute a tolerance region. 
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An extension of this case has been made by Wald [2]. Namely, before a 
sample is taken let us choose a numerically valued function f of w and choose 
k(n) of the w; and order them in a sequence w), wi), --- , w&? such that 

(we?) > Sway) and aj41 > a;. Next, within each ‘‘strip” of the (x, y) plane 
such that w = (2, y) satisfies f (wi. )>fw)>f (ws; ), suppose that we follow 
the construction in the previous paragraph. Then the population coverage, u, 
by n — m + 1 blocks from one or more of these strips or their exteriors has the 
probability element (2.1). 

Again the warning must be made that the above functions f, fi, fo, «+: ,fn, 
the numbers a; , a2 , --- , a, and the sequence of construction must be completely 
specified before samples are drawn to which this scheme is to be applied. 


6. Illustrations for two variates. As an example of the use of the graphs for a 
two-variate case, we use an ype cited by Tippett [8]. The two variates are 
the percentage of pig iron, X, and the lime consumption, Y, per ewt. of steel in 
100 steel castings made aes slag control. <A scatter Snapem is given in 
Figure 4. Unfortunately the value of this example is lessened by the fact that 
the block schemes were made after the sample had been taken; it does illustrate, 
at least, the two simple types of scheme. 

The tolerance region 7' (solid lines in Figure 4) resulted from the following 
scheme: let fi(w) = y, fo(w) = f3(w) = fi(w) = fi(w) = fe(w) = —y. Now 
follow the Wald procedure choosing f(w) = y with k = 6, and a, = 1, a = 18, 

= 46, a, = 75, as = 90, ag = 96. Then in each strip ya;,, > y > Ya; let 


f.(w) = x. Considering only the blocks within the heavy line as the tolerance 


region, we have, by counting the discarded blocks, m = 16. 

In constructing the region T’ (broken lines in Figure 4) we also use Wald’s 
method, taking f(w) = y — 5x with k = 2 and a, = 3, a2 = 96. In the exterior 
region with f(w) > f(wgs) let all f; = y + 5z and similarly in the exterior region 
f(w) < f(ws”). Then in the strip f(wse) > f(w) > f(w) (i.e., in the region in 
which 41 > y — 5x > —77) choose fi(w) = y, fe(w) = fa(w) = fa(w) = —y, 


fs(w) = fe(w) = fi(w) = y + 5a, and fs(w) = fo(w) = —y — 5x. Counting 


the blocks outside the heavily bordered region, we have m = 17. 


We obtain by interpolation 6 = .80 for 7 and 8 = .78 for T’ at the a = .90 
level. 


7. Ties. A tie is a sample point which in a coordinate system defining a set 
of order statistics coincides in one or more coordinates with other sample points. 
For instance, in the X coordinate of our example (32, 159) and (32, 185) are tied, 
and (47, 218) and (47, 218) are tied in any system of coordinates. It would 
seem easier to avoid ties with regions of the type of 7’ than with those of the 
type of T. 

The existence of ties in the population is assumed impossible, because positive 
point probabilities would destroy the continuity of the c.d.f. Therefore we 
attribute the ties to the crudity of measuring devices. 
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A procedure for handling ties is given by Tukey [4]. 
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Fic. 4. Illustrative Tolerance Regions for Two Variates. 
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THE FOURTH DEGREE EXPONENTIAL DISTRIBUTION 
FUNCTION’ 


By Leo A. AROIAN 
Hunter College 


We shall derive a recursion formula for the moments of the fourth degree 
exponential distribution function, state its more characteristic features, and show 
how the graduation of observed distributions may be accomplished by the method 
of moments and the method of maximum likelihood. The purpose of the note 
is to make possible a wider use of this function. 

R. A. Fisher [1] introduced the fourth degree exponential function 


(1) ye = kexp {—(Bitt + Bsl® + Bol” + Bit)}, 


where 7) < t < m2, ¢ = (x — m)/o, m indicates the population mean, o the 
population standard deviation, and where the @’s are functions of 


a4, = / tye dt. 
r1 


A. L. O’Toole in two stimulating papers [2], [3], has studied (1); however his 
methods and results are unnecessarily complicated. O’Toole requires eight 
moments to determine parameters similar to the 8’s. Both Fisher and O’Toole 
considered the restricted class of (1) with range (— «, «). 


Let 
(2) u = t" exp {—(Bit' + Bst® + Bot”)}, dv = e ** dt 
in 
(3) a, = [ t"y, dt, obtaining 
r1 


(4) 4Bisaings + 3830Qn42 + 2B20n41 + Bian = NOn1 : n= 1,2,3,--:, 


1 Presented to the American Mathematical Society and the Institute of Mathematical 
Statistics, September 4, 1947. 
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and for n = 0, the right side of (4) is defined as zero. 
the assumption 


The result (4) is valid under 


(5) wvj;? = 0. 
Given the first six moments, 6; , 83, Be, 6: are readily determined. It will be 
found that if 6; > 0, 8; ~ 0, thenr; = — ©, re = ~; while if & <0, and B; ¥ 0, 


r, and r2 will be finite. If we set n = 0, 1, 2, 3, in (4), the solutions are 


Bs = {a3(as — 4a3) — (as — 3)(a, — 1)} + 4D; 
6) a= ne Ra + ie in ~ Oh + 
Bz = {(az3 — as)(as — 4az3) + (a4 — 1)(as — a3 — 3ay)} + 2D; 
Bi = {az(as — asa — 3a, + 303) — (as — 3)(a5 — azcu)} + D, 
where 
D = (as — ai — 0§)(a1 — a3 — 1) — (as — «3 — ay)” = 0. 


To prove D = 0 we adopt the method of J. E. Wilkins Jr. [4]. 
caseisD =0. Let 


In only a trivial 


IV 


G(a, b, c,d) = | (a+ bt +c + dt’)*y, dt = 0, 


where y; is any probability function with range 7, St S rm. 


is a semi-definite quadratic form, its discriminant will be non-negative. 
its discriminant is easily seen to be equal to D, thus 


Since G(a, b, c, d) 
But 


| a3 1 0 1 


a4 a3 1 0 
=. 


as a4 a3 1 
a6 & A 


We summarize without proofs the essential features of the fourth degree 
exponential. Near the normal point, a, = 3, a; = 0, the fourth degree expo- 
nential function, the Pearson system, and the Gram-Charlier Type A are essen- 
tially alike. Type C [5] while similar is not the same. Note that 8, may be 
negative and in such a e.se 7; and 72 are the two real zeros of the derivative of (1). 
The exponential may be bimodal as well as unimodal and the normal curve is 
the special case Bs = 38; = 8; = 0. Various special cases where a particular 8 
is zero are readily handled by either (4) or (6). The graduation of both unimodal 
and bimodal observed distributions will be published elsewhere. 

Let 


(8) ye =kexp— > 6, n<t<ne, 
i=] 
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where 

1 rT? r : 
(9) - i exp — >, B;t’ dt. 

c r} j=1 
The likelihood, L, in a a of N is given by 
(10) L — Y exp {— {ed : ti + Br-1 > o ES + By a us| 

i=l) 


where t; = (2; — m)/o. Then 


dlogL _ Nok : 
1 i 
ine 7 a8; =F K a6 = Zt 


1 ok 7 
* nett > B;ri Mt 


and 
(12) 


( E :) ; 
If we assume either r; and rz constant, or exp { — > B;ri\ and exp \- = B; 

\ j=1 ) j=1 ) 
negligible, then (12) becomes 


bf oe exp \- > Bt \ ae and ° ma 0 implies 
L ab; 


[ve oni Sait ba et 


(13) ie ea “N = fs gu 1,3 *+*#, 


[ew {- Ean 


where a; is the sample estimate of a;. For, if in > ti/N we let 7 = 1, 2, we find 
by (13) that @ = m, and o’ = D(z: — #)’/N. The solution of (13) provides esti- 
mates of 6; , 63, 62, and 6, , if we set r = 4. Naturally more time is required 
for the solution of (13) as compared with the method of moments, but the maxi- 
mum likelihood estimates are asymptotically efficient. The system (13) must 
be solved by successive approximations. To determine the moments solution 
all we do is to replace a; by a; in equations (6). This affords a point of departure 
from which the maximum likelihood equations may be solved. The two methods 
are not the same. 

The fourth degree exponential is readily generalized to a fourth (or rth) degree 
multivariate function including the normal multivariate function as a spe- 
cial case. 
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AN APPROXIMATION TO THE BINOMIAL SUMMATION 
By G. F. CRAMER 


Washington, D.C. 


We consider the binomial expansion (q + p)", where g = 1 — pandnisa 
positive integer. For given values of n, p, r, and s, where np < r < s < n, 
we are often interested in the probability P(r < x < s) that the number of suc- 
cesses x will satisfy r < x < s. 

When n does not exceed 50, we can use tables of the Incomplete Beta Function, 
or other convenient and accurate tables. For “large” values of n, we can use 
normal tables. When p is “small”, we can use Poisson tables. Hovever, it is 
often true that 7p is fairly small, and vet not small enough to give really accurate 
results when Poisson tables are employed in the usual way, while 7 is too large 
for use of the tables of the Incomplete Beta Function and yet too small for ac- 
curate use of normal tables. 

It frequently happens that an upper bound for P(r < x < s) would serve our 
purpose. We propose to show how to find this from Poisson tables with greater 
accuracy than could be obtained by using these tables in the ordinary way. 

We shall denote the general term of the binomial expansion by B; = (?)p'qg”* 
and the general term of the corresponding Poisson distribution with the same 
value of p by P; = (pn)‘e ?”/i!. We shall also consider a second Poisson dis- 
tribution whose general term is given by P = (p’n)'e”’"/i!, where p’ + p 
will be determined later. 

We shall use the following notations: 


(1) Ui; = Biya/Bi = (n — a)(p)/G + 1) — p); 
(2) Vi = Pins/Pi = pn/(t + 1); 

(3) Vi = Pin/P; = p'n/( + 1); 

(4) U; — Vi = p(np — 1)/@ + 1)(1 — p). 


From (4) we obtain at once the following: 

Lemma I. U; > V;or U; < V; according ast < np ori > np. 

Thus, the size of the general term of the binomial expansion falls off more 
steeply to the right of 7 = np than does that of the general Poisson term. 
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We can use lemma I to obtain an upper bound to P(r < x < s) forany r > np. 
In fact, 


B, = BFP./P, : 
Bos < Bo veal Pe ; 
Byx2 < ByyiPr42/Prii1 < BrPri2/P; 5 


B, < BP JS?, . 
Adding these, we obtain 


(5) Pir<a<s) = DB; < (B,/P,) LD Pi = (B,/P,) (= P; — oP.) 
The quantity in parentheses in (5) can be found by use of the cumulative Pois- 
son table provided, of course, it is within the range of that table, while the 
B,/P, can be computed directly. 

In the work we have done so far, we have used a Poisson distribution which 
is less steep than the corresponding binomial distribution throughout the whole 
interval np <r <2< n. It seems reasonable to investigate the possibility of 
improving upon (5) by using a Poisson distribution having a different value p’ 
in place of p, where p’ is chosen so that the new Poisson distribution is of the 
same steepness at x = r as is the binomial distribution. We wish to have 
U, = Vi and U; < V; for all r <i<n. The first of these conditions requires 
that (n — r)(p)/(r + 1)(1 — p) = p’n/(r + 1). Solving for p’ we obtain 


(6) p’ = (n— 1r)(p)/(n)(1 — p). 


We are now ready to prove the following: 

Lema II. If p’ is defined by (6) and if U; , V;, and V; are defined by (1), (2), 
and (3) respectively, then U; < V: < Vi, provided r > np andi > r. 

It is easy to see that U;/V; = (n — i)(p)(1 + 2)/(1 + a)(1 — p)(np’), and 
this can be reduced to (n — 7)/(n — r) by replacing p’ by its value from (6). 
Then U;/V; < 1 since i > r. Moreover, we have V;/V; = (p’n)(¢ + 1)/ 
(i + 1)(pn) = p’/p = (n — r)/(n — np). But r > np and hence V; < V;. 
This completes the proof of Lemma II. 

We are now in a position to obtain an inequality somewhat better than (5). 
The derivation of the new upper bound for P(r < x < s) goes just as before 
except that each P; is replaced by P;. We obtain the new inequality 


(7) P(r < x < 8s) < K’B,/P,, 
where K’ = >> P: — DOP;. 


We can get a lower bound as well as a somewhat improved upper bound for 
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P(r < x < s) by calculating B, and B,4+: directly and then applying (5) or (7) 


to find an upper bound M of P(r + 1<a2<s). This gives the inequality 
(8) B, + Bri < Pr <az<s) < B+ M. 


This could, of course, be still further improved by calculating directly still more 
of the B;’s and using a similar procedure, but one would not care to carry this 
very far. 

To illustrate the various approximations, we have worked out a numerical 
example the results of which appear below. For convenience in checking, we 
have used a value of n which is within the range of the tables of the Incomplete 
Beta Function, even though we would ordinarily use our method only for larger 
values of n. 

EXAMPLE. s = n = 40;r = 10; p = 1/10; p’ = 1/12. The tables of the 
Incomplete Beta Function give P(10 < x < 40) = .0050631. Using Poisson 
tables in the usual way, we get P(10, 4) — P(40, 4) = .008132, which is not 
particularly good. Using inequality (5) we obtain: By/Pi = .6790 and 
P(10 < x < 40) < .6790(.008132) = .005522. Using (8) and calculating both 
By and By ; we take r = 11 in the inequality (5) and obtain By = .0035934, 
By = .0010889, P(11, 4) — P(40, 4) = .002840, Bu/Pu = .5657, and hence 
.004682 < POO < x < 40) < .003594 + .001607 = .00520. Again using 
method (8), but calculating By, also and using r = 12 in inequality (5), we get 
004974 < P(10 < x < 40) < .005099, which is quite good. We can obtain a 
still better result by using inequality (7) instead of (5). Then p’ = 1/12, 
np’ = i0/3, By/Pw = 2.150 + , P(10, 10/3) — P(40, 10/3) = .002366, and 
P10 < x < 40) < .005087. 
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Presented at the Madison Meeting of the Institute, September 7-10, 1948 


1. On Distribution-free Confidence Intervals (Preliminary Report). WassiLy 
Hoerrrp1nG, University of North Carolina, Chapel Hill. 


Let 6(F) be a functional of a distribution function (d.f.) F(z) (where z is a real number 
or a vector), defined over a class Y) of d.f.’s; O, a random sample from a population with 
d.f. F(x); 8, < 0, two functions of O,; and an, = Pr{@, < 6(F) < 6,}. Conditions are studied 
under which, given a,0 < a < 1, we have either a, = a or a, > a@ or a, — a, for all F(z) 
in 9), where 9) is defined independently of the functional form of F(x). Under fairly gen- 


eral conditions we can obtain by ‘‘studentization”’ confidence limits @,, 6, such that lim 
no 
a, = a, and y = lim EvV/n(@, — 9,) exists; is minimized by using a least variance estimate 


n> oO 

of 6(F). If there exists a function «(6) such that var T, < «°(@)n~' if 0(F) = 6, for all F 
in Y’, we can define confidence limits with a positive lower bound for @,. This applies toa 
number of population characteristics estimated by rank order statistics, such as the co- 
efficients p’ and 7 (estimated by Spearman’s and Lindeberg-Kendall’s rank correlation 
coefficients, respectively). In certain cases (including p’ and 7), 6(F) admits a binomially 
distributed estimate; then exact confidence limits can easily be obtained. This research 
was done under an Office of Naval Research contract. 


2. On Certain Statistics for Samples of 3 from a Normal Population. JuLius 
LIEBLEIN, National Bureau of Standards, Washington. 


In analytical chemistry three determinations are frequently made. Sometimes the 
average of only the two closest results is reported, the remaining observation being rejected 
as anomalous. In preparing a critique of this procedure, Dr. W. J. Youden encountered 
a need for information on certain properties of the distributions of the statistics 
(x’ — x’) /(t3 — 21), (x’ + 2"’)/2, and (x’ — 2’’)/2, where z’ and z2”’ (z’ > 2’) are the two 
closest of the three determinations. This paper shows how these statistics differ from the 
ones heretofore treated involving ‘‘fixed’’ order statistics; gives the distribution of these 
statistics in random samples of 3 from a normal universe; and lists values of certain of the 
moments of their distributions. 


3. On Multinomial Distributions with Limited Freedom: A Stochastic Genesis 
of Pareto’s and Pearson’s Curves. Maria CAsTELLAIN, University of 
Kansas City. 


The purpose of this paper is to investigate the most probable configuration of N random 
clements to be distributed in K(K < N) class intervals, where known forces are acting. 
We shall call these intervals of energy, using the terminology of statistical mechanics. 

We will prove that the most probable configuration is a configuration of statistical equi- 
librium since its probability of occurring converges to 1 as N becomes infinitely large. 

The main purpose of this paper is to discover which forces of attraction, operating in 
the intervals of energy, give Pareto’s and Pearson’s curves when statistical equilibrium 
is reached. 

We will consider a random variable Y(t), ¢ being an independent variable, obeying a 
multinomial distribution law with limited freedom, and we will exploit the familiar process 
of statistical mechanics. The equation of the frequency curves corresponding to the equi- 
librium stage of the statistical experiment will be shown. 
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4. Fitting Generalized Truncated Normal Distributions. HAaroutp Hore..ina, 
University of North Carolina, Chapel Hill. 


In asample from a p-dimensional normal distribution only those individuals are supposed 
to be observed which fall in a specified but arbitrary set A of positive measure. For esti- 
mating the parameters the method of moments is proved equivalent to that of maximum 
likelihood and therefore efficient. The problem is thus reduced to that of expressing the 
parameters of the normal distribution in terms of the moments of the truncated distribu- 
tion. This however is not generally possible in simple explicit form. Methods are pre- 
sented for dealing numerically with several special cases, including those in which A is a 
linear interval or a parallelogram. 


5. On the Distribution of the Two Closest Observations Among a Set of Three 
Independent Observations. G. R. Sern, Iowa State College. 
Let 21 , Z2 , 3 (%1 < Z2 < 23) be three independent ordered observations from a population 


having a probability density function f(z). Let 2’, 2’ (x’ < x’) be the two closest, then the 
probability density function of z’, z’’ is given by 


6 - f(z’) - f(x’) + F(2z2” — x’) — F(2z’ — x"’)| 


where 
F(z) -/ f(x) dz. 


In the case f(z) is a normal distribution with unit variance, the joint distribution 


vr , 


of y = z’’ — x’ andz = ——— is obtained as 
t3— Zi 
27/2 21 — 2+ 2?) 
2V/ tr 4. | ~-¥0—8+ 9) 
2? 32? 


This problem is of interest in cases where the conclusions are to be based on a set of 
three observations and one of the observations is to be rejected in the analysis of the data. 


6. The Derivation of Certain Recurrence Formulae and their Application to the 
Extension of Existing Published Incomplete Beta Function Tables. T. A. 
Bancrort, Alabama Polytechnic Institute, Auburn (presented by title). 


The objects of the paper are: (1) to give a number of new recurrence formulae in the in- 
complete beta function derived by a new method, and (2) to indicate how these new formulae 
have been used to obtain new tables of the incomplete beta function that are outside the 
range of the p and q values given in the existing published tables. 

The recurrence formulae have been derived by considering the incomplete beta function 
as a special case of the hypergeometric series, thus 


xP 
Bz(p, q) = D F(p, 1 —=¢, P +> 1, x), 


where the usual form of the hypergeometric series is 


a-bz a(a + 1) - b(b + 1) x? 


F(a, b,¢,2) =14+ i+ eqn 2! 


a(a + D(a + 2) - b+ VGO+2VH 
c(c + 1)(e + 2) 3! ‘ 


> 
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This series converges for|x | <1,andz=1, ifand onlyifa+6 <c. Certain recurrence 
formulae for F(a, b,c, x) are then directly converted for use with B,(p, q), or in the so-called 
normalized form I;(p, q), providedc = a+ 1. All conditions have been satisfied by setting 
a=p,b=1—q,c=p+l,andq>0. 

For example, using the above mentioned methods we may obtain, among many others, the 
recurrence formulae: 


(i) 2Iz(p,q) —I:(p+1,q) +0 — al.(p+1,¢q—1) =0, 
(ii) (p+ q— px)Iz(p,q) — qlz(p,qg+1) — p(l — aI.(p+1,¢—1) = 0, 
(iii) gI-(p,q +1) + plz(p+1,q) — (p+ Q)I2(p,q) = 0. 


Formula (i) is essentially the basic recurrence formula used .o obtain Karl Pearson’s 
tables. An indication of formula (iii) in another form was given by the author in the paper 
‘‘On Biases in Estimation Due to the Use of Preliminary Tests of Significance,’’ Annals of 
Math. Stat., Vol. 15 (1944), p.194, and a direct proof was later given by the author in ‘‘ Note 
on an Identity in the Incomplete Beta Function,”’ Annals of Math. Stat. Vol. 16 (1945), pp. 
98-99. All of the material in the present paper, however, is new, including recurrence form- 
ulae and tables and the mathematical method of derivation. 


7. Asymptotic Studentization in Testing of Hypotheses. Herman CHERNOFF, 
Cowles Commission for Research in Economics. 


If H is a hypothesis for which ¢ < c,(@) would be a good test if the value of the nuisance 
parameter @ were known and 6 is an estimate of @, then the following method of asymptotic 
studentization (obtaining critical regions of almost constant size) was suggested by Wald. 
Consider t < ¢(6) where ¢(6) = ¢,(6) + --- + ¢,(6) and Pr{t < c:1(6)} = a, Pr{t— c1(6) < 
co(0)} = a, «++ Pr{t — ¢,(6) — --- — ¢,(6) < ¢r4i(0)} = a. It is shown that under reason- 
able conditions this test, and various modifications, designed for those cases where the c,(@) 
are difficult to obtain exactly have the asymptotic property that Pr{t < ¢(6)} = 
a+ O(N-*!?) where N is the size of the sample involved or an analogous variable. This 
property can be extended to the case where @ is a k-dimensional variable. 


8. Completeness, Similar Regions, and Unbiased Estimation. (Preliminary 


Report.) Erica L. LEHMANN AND HENRy ScHErFrFrs, University of California 
at Los Angeles. 


A family 2% of measures M on a space X of points xz is defined to be complete 
if / f(z) dM = Ofor every Min M implies f(z) = 0 except on a set A for which M(A) = 0 for 
x 


every M in Mt. For a given family of measures the question of completeness may be re- 
garded as the question of unicity of a related functional transform. Classical unicity re- 
sults are applicable to many families of probability distributions that have been studied by 
statisticians. The notion of completeness throws light on the problem of similar regions 
and the problem of unbiased estimation. The concept of a mazimal sufficient statistic— 
roughly, a sufficient statistic that is a function of all other sufficient statistics—is developed. 
A constructive method of finding such is given, which seems to apply to all examples or- 


dinarily considered in statistical theory. A relation between completeness and maximality 
is found. 


9. On a Proposed Method for Estimating Populations. Crci C. Craic, Uni- 
versity of Michigan, Ann Arbor. 


It was proposed to the author by a biologist that a method be devised for estimating the 
total population in an area which shall utilize the minimum distances between randomly 
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chosen individuals and their neighbors in directions lying in each of the four quadrants. 
Assuming that the area is a square and that the distribution law over it is rectangular, it 
turns out that the complete distribution of the lengths of sides of minimum squares which 
contain a second individual is simpler than that of minimum distances. In both cases a 
simple estimate is found which uses most but not all of the information in a sample and 
whose efficiency is comparable to that based on a complete enumeration of a sample area, 
though such an enumeration is not always possible. 


10. Some Results on the Asymptotic Distribution of Maximum- and Quasi- 
Maximum-likelihood Estimates. Hrrman Rust, Institute for Advanced 
Study. 


The author investigates the asymptotic normality of maximum- and quasi-maximum- 
likelihood estimates of parameters of systems of linear stochastic difference equations. 
The principal tool is the extension of the Central Limit Theorem to dependent variables pre- 
viously obtained by the author (presented to the American Mathematical Society in April, 
1948). The results obtained are analogous to those in the case in which no differences are 
present. Some extensions are also made to systems of stochastic difference equations linear 
in the coefficients but not necessarily in the variables. If the complete system of stochastic 
difference equations is linear in tke jointly dependent variables, asymptotic efficiency is 
demonstrated for maximum-likelihood estimates. 


11. The Probability Points of the Distribution of the Median in Random Samples 
from Any Continuous Population. CHuRcHILL EISENHART, LoLA 8. DEMING, 
and Cexia 8. Martin, National Bureau of Standards, Washington. 


The abscissa of the (one-tail) e-probability point of the distribution of the median in 
random samples of size n = 2m + 1 (m > 0) from any continuous population is ident? -al 
with the abscissa of the corresponding P,,.-probability point of the parent distribution, 
where P,, is determined by 


(1) > CPAP.) =6 0 <«< 1). 


hed 
k=}(n+1) 


From (1) it follows that 
(2) | ae = 1 — Pan 
and that 


1 1 
> a in acniiienediiiiict a ae 

(3) I — (h + bi. n + 1) = 1 + F,(n ‘ 1, 2+ 1) = 1 + rz Mtintt) ’ 

where 2,(v1 , v2), F(n1 , v2), and Z,.(m , v2) denote the e-probability points of the incomplete- 
beta-function distribution, Snedecor’s F-distribution and Fisher’s z-distribution, for 
vi(= 2q) and v2(= 2p) ‘degrees of freedom’, respectively. The foregoing results are cer- 
tainly not ‘‘new’’: Harry S. Pollard implicitly utilized the first equality on the extreme left 
of (3) in his doctoral dissertation at the University of Wisconsin in 1933 (see Annals of 
Math. Stat., Vol. 5 (1934), p. 250), and John H. Curtiss has given the generalization of (1) 
appropriate to the case of the ‘rth. position’ in random samples from any continuous popu- 
lation (see Amer. Math. Monthly, Vol. 50 (1943), p. 103) and utilized (3) explicitly to obtain 
the 5% point of the distribution of the median in random samples of size n = 23. The aim 
of the present paper is to give these results somewhat greater publicity—they are hardly 
“‘well known’’. To this end a table (Table 1) is given of the values of P,., to 5 significant 
figures for e = 0.001, 0.005, 0.01, 0.025, 0.05, 0.10, 0.20, 0.25 and m = 3(2)15(10)95, together 
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with expressions from which P,,, can be evaluated accurately and conveniently for values 
of n (and e) not included in the table. Numerical examples illustrate the use of the table 


and formulas. Concise derivations of the fundamental relations and formulas are given 
in an appendix. 


12. On the Arithmetic Mean and the Median in Small Samples from the Normal 
and Certain Non-Normal Populations. CHurRcHILL EIsENHART, Loa 8. 
Demin, and Cexia S. Martin, National Bureau of Standards, Washington. 


Let #.., and 2,,, denote the abscissae of the one-tail ¢-probability points of the arith- 
metic mean and the median, more specifically, the abscissae exceeded with probability « 
by the mean and the median, respectively, in random samples of size n (= 2m + 1) from 
any specified population, and let cz, and oz, denote the standard deviations of the mean and 
the median in such samples, respectively. The following symmetrical populations with 
zero location parameters and unit scale parameters are considered in this paper: 





Tupe 
. . 1 = 
normal (Gaussian) a re —-»x» <2 0 
wT 
° —!z/| 
double-exponential (Laplace) 4e7'”', —-x <r 
rectangular (uniform) i, —-}<2z<s} 
2 
Cauchy 7142 . —-e i zs 
T - 
sech —sech z, —-e <2 w 
T 
sech? (derivative of ‘‘logistic’’) 3 sech? z, —-e srs 0 


Using the basic table, relating probability points of the distribution of the median to prob- 
ability points of the parent distribution, given in Churchill Eisenhart, Lola S. Deming and 
Celia S. Martin, ‘‘The probability points of the distribution of the median in random sam- 
ples from any continuous population,” values of Z,,, for random samples from each of the 
above distributions have been evaluated, and are tabulated to 5 decimal places in the pres- 
ent paper, for n = 3(2)15(10)95 and e = 0.001, 0.005, 0.01, 0.025, 0.05, 0.10, 0.20, 0.25. 

In the case of the normal distribution, values of £,,, to 5 decimal places are given also for 
the aforementioned combinations of e and n. Comparison of the values of %,,, and &,., 
gives precise numerical meaning to the well-known lesser accuracy of the median as an 
estimator of the center of a normal population, for samples of any odd size (n = 2m + 1). 
Values of the ratio R..n = Ze.n/Fe.n are given also for this case (normal population), to 4 
decimal places for the above combinations of ¢ and n, together with the best available values 
of oz, /oz, for n = 3(2)15(10)55. When0O < e < 0.025, the ratio R,,, exceeds the ratio z,,/0z,, , 
showing that the ‘tails’ of the exact distribution of the median are ‘longer’ than the tails of 
the normal distribution with the same mean and standard deviation; and, when 0.05 < « < 
0.25, the ratio R.,, is less than oz,/oz,. (A theoretical argument shows that the point 
of equality is close to the 0.042-probability point.) A method for computing cz, , based on 
the foregoing, is given that is believed to be accurate to .001/+/n, or better for n > 3. 

In the case of the double-exponential distribution, values of Z,.,n are given to 4 decimal 
places for n = 3(2)11, and e = 0.005, 0.01, 0.025, 0.05, 0.10, 0.25, for comparison with the cor- 
responding values of Z,,,. It is found that when n = 3, F..3 < 3 for ¢ = 0.005, 0.001, 
and 0.025, indicating that in random samples of 3 from a double-exponential distribution 
the arithmetic mean furnishes narrower confidence limits for the center of the distribution 
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at 0.95, 0.98, and 0.99 levels of confidence. When n = 5, the mean is ‘better’ at the .98 and 
.99 levels of confidence; and, when n = 7, at the 0.99 level. For all other combinations of 
e and n (> 3), the median is ‘better.’ 

In the case of the rectangular distribution, values of Z,,, are tabulated to 4 decimals for 
n = 3(2)9, and values of %,,, , the e-probability point of the mid-range in samples of n, 
for n = 3(2)15(10)95, in each instance for e = 0.005, 0.01, 0.025, 0.05, 0.10, 0.25, and in the case 
of Z,.n for « = 0.001 also. The superiority of the midrange over the mean and the median, 
well-known but here exhibited numerically for the first time, is truly amazing. 

It is planned to provide values of @,,, for samples from the sech and sech? distributions in 
the final paper. 


13. The Relative Frequencies with which Certain Estimators of the Standard 
Deviation of a Normal Population Tend to Underestimate its Value. 
CHURCHILL EISENHART and Cexia 8. Martin, National Bureau of Standards, 
Washington. 


Let 2% , 22 ,°-- , 2, denote a random sample of n independent observations from a normal 
population with mean » and standard deviation ¢. Common estimators of o are 


n — ES 
mo / 3 (x; — ¥)2/n, Ss. = sivn/(n — 1), S83 = §1/C2, 
i=l 
= n 
ry 249 / : 
my = — ms |% ~ 2\/n, me = mV n/(n — 1), 
9 i=1 : 


and R, = (4; — 2x)/d2, where 7 = D 2;/n, 2, is the largest and zy the smallest of the 
b S o, L if 


2’s,¢2 = E(s:), and d. = E(x, — xg), the symbol E(_ ) denoting ‘‘mathematical expectation 
(or mean value) of.’’ A table is given that shows to 3 decimals the relative frequencies 
(probabilities) with which these estimators tend to underestimate ¢ when n = 2(1)10, 12, 
15, 20, 24, 30, 40, 60. The results show among other things that, for very small samples 
(n < 10) such as chemists and physicists commonly use, Bessel’s formula for the probable 
error, which is based on s2 , has a marked downward bias in the probability sense (in addi- 
tion to its known slight downward bias in the mean value sense), whereas Peter’s formula, 
which is based on mz , has only a slight downward bias in the probability sense and no bias 
in the mean value sense. A table of divisors is given by means of which ‘‘ median estima- 


n n 


tors’’ of ¢ can be computed readily from the basic quantities D (2; — 7), D | 2; — |, and 
i=1 i=1 


(xz — 2s), that is, estimators that will over- and underestimate ¢ equally often in repeated 
use. An application to control charts is noted. Median estimators, like maximum likeli- 
hood estimators (‘‘modal estimators’’) have the useful property that if 7} is a median esti- 
mator of 6, then f(7'}) is a median estimator of f(@), a property unfortunately not possessed 
by the customary ‘“‘unbiased”’ (‘‘mean’’) estimators. 


14. Some Non-Parametric Tests of Whether the Largest Observations of a Set 
are too Large. (Preliminary Report.) JoHn E. Watsu, Douglas Aircraft 
Company, Santa Monica, California. 

Let z(1), --- , z(n) represent the values of n observations arranged in increasing order of 
magnitude. By hypothesis these observations have the properties: (1) They are independ- 
ent and from continuous symmetrical populations (2) For large n the variances of the tail 


order statistics are either very large or very small compared with the variances of the cen- 
tral order statistics (3) For large n the tail order statistics are approximately independent 


eS 
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of the central order statistics (4) Each observation is from a population whose median is 
either 6 or gy, where z(n — r+ 1), --- , x(n) are from populations with median @ while the 
central and smaller order statistics are from populations with median ¢. The test is: 
Accept gy < 0if min [x(n — tx) + r(x); 1<k Ss <r] > 2z(t.), where tu < tusi, Jv < Jogi, 
it, = r — 1, and ¢t,is defined by Pr [x(iz) <e | O0=¢] =a. Here 


a = Pr {min [x(n — ie) + (jr); 1 Sk Ss <r] > w]e = gh. 


For large n the significance level of the test is approximately a while the significance level 
does not exceed 2 a for any value of n. Suitable values of a can be obtained forr > 4. As 
6 —y— —~ the power function tends to zero, while the power function tends to unity as 
@—¢g— «. For 6 — ¢ <0 the power function is monotonically increasing. 


15. On the Bounded Significance Level Properties of the Equal-tail Sign Test 
for the Mean. Joun E. WaAtsu, Douglas Aircraft Company, Santa Monica, 
California, (Presented by Title). 


The equal-tail sign test for deciding whether the population mean » is equal to a given 


y ‘ ia .. s+! 
hypothetical value yo is defined by: Accept uw ¥ uo tf either 2; < wo Or In41-i > wo, | 1 > — 2 
Here x; , (j = 1, --- , ), is the jt largest of n independent observations drawn from n 


populations which satisfy the conditions: (i) The mean of each population has the value u. 
(ii) Each population is continuous at its mean. (iii) The mean is at a 50% point for each 
population. This paper investigates how the significance level of the equal-tail sign test 
varies when (i)-(iii) are not satisfied. It is found that the significance level does not differ 
noticeably from its hypothetical value under conditions much more general than (i)-(iii). 
This significance level stability, combined with the properties of being easily applied and 
reasonably efficient for small samples from a normal population, suggests that the equal- 
tail sign test be considered for application whenever the population mean is to be tested on 
the basis of a small number of observations. 


16. Infinitely Divisible Distributions. WuLu1AM FELLER, Cornell University, 
Ithaca, New York. 


A simple derivation of P. Lévy’s formula is given starting from the following definition: 
a distribution function F(z) is infinitely divisible if for every 7 it is possible to find finitely 
many distributions Fx, »(z) such that F(x) = Fi, n(x)* +--+ * Fiéain(x) and that Fx,.(x) tends 
to the unitary distribution uniformly inn. This definition is more general than the one 
used by P. Lévy and Khintchine. The equivalence of the two definitions was proved by 
Khintchine by deep methods. The new approach renders the equivalence obvious. Fur- 
thermore, a new characterization of infinitely divisible distributions is given; it is equiva- 
lent to Gnedenko’s characterization but requires no special analytical tools. 


17. Fluctuation Theory of Recurrent Events. Wu.i1AM FELLER, Cornell Uni- 
versity, Ithaca, New York. 


Consider a sequence of independent or dependent trials but suppose that each has a dis- 
crete sample space. The paper studies recurrent patterns & which can be roughly charac- 
terized by the property that after every occurrence of & the process starts from scratch, 
the conditional probabilities coinciding with the original absolute probabilities. Typical 
examples are success runs, returns to equilibrium, zeros of sums of independent variables, 
passages through a state in a Markov chain. New methods are developed unifying and 
simplifying previous theories and applying to larger classes of recurrentevents. Itisshown 





602 ABSTRACTS OF PAPERS 


in an elementary way the probability that & occurs at the n-th trial either has a limit or is 
asymptotically periodic. This theorem has many consequences. For example, the ergodic 
properties of discrete Markov chains follow in a few lines, and the difference between finite 
and infinite chains disappears. Several theorems of the renewal type are proved. Weak 
and strong limit theorems for the number N, of occurrences of & in x trials are derived 
shedding new light on stable distributions. 


18. Formulas for the Percentage Points of the Distributions of the Arithmetic 
Mean in Random Samples from Certain Symmetrical Universes. Utram 
CuHanp, University of North Carolina and National Bureau of Standards. 


Using the method of Fisher and Cornish, the 100e% point of the distribution of the arith- 
metic mean in random samples of size N from any universe having finite cumulants of the 
first four orders, x; , k2 , «3 , ks , is expressed to order 1/N? as a function of NV, the 100e% point 
of a standardized normal deviate and the quantities x; , x , x3/Ko3!2, «,/k3 . The numerical 
coefficients are evaluated for the cases of sampling from rectangular, double-exponential, 
sech and sech? distributions. The application of the resulting formulas is illustrated nu- 
merically for e = .001, .005, .010, .025, .050, .100, and .250. In the case of the rectangular 
and double-exponential distributions, the results obtained for V = 10 are compared with 
accurate values, indicating the accuracy of the formulas. 





| 
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NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


Professor T. A. Bickerstaff has been appointed Chairman of the Department 
of Mathematics at the University of Mississippi. 

Professor Raj Chandra Bose has resigned as head of the graduate Department 
of Statistics of the University of Calcutta, and has been appointed Professor of 
Mathematical Statistics at the University of North Carolina beginning in the 
winter of 1949. Professor Bose is an authority on the design of experiments 
and is writing a book on the combinatorial mathematics of the subject. He has 
also published extensive contributions to differential geometry and to multi- 
variate statistical analysis, and has been instrumental in developing practical 
sample surveys. He served as Visiting Professor in the Institute of Statistics 
at North Carolina in the winter and spring of 1948. 

Mr. Hamilton Brooks’s paper, ‘““The Probable Breakdown Voltage of Paper 
Dielectric Capacitors,’’ was one of the four papers selected for a national award 
by the American Institute of Electrical Engineers. His paper presents the sta- 
tistical treatment of an engineering problem and shows by experiment how 
insulation strength distribution is determined by the distribution of the extreme 
size of flaws. 

Dr. C. West Churchman, formerly a member of the staff at the University of 
Pennsylvania, was appointed Associate Professor of Philosophy at Wayne Uni- 
versity, Detroit 1, Michigan, starting February 1, 1948. 

Dr. William G. Cochran has accepted an appointment as Professor of Bio- 
statistics in the School of Hygiene and Public Health of the Johns Hopkins Uni- 
versity and will assume this post in September. Dr. Cochran, a native of 
Glasgow, Scotland, comes to Johns Hopkins from the University of North Caro- 
lina where he served as Associate Director of the Institute of Statistics from 1946 
until the present. 

Dr. Louis M. Court has been promoted to an assistant professorship in the 
Mathematics Department of Rutgers University. 

Dr. Donald A. Darling, formerly a member of the staff at Cornell University, 
has accepted an assistant professorship at Rutgers University. 

Mr. Aryeh Dvoretzky has been appointed a member of the Institute for Ad- 
vanced Study, Princeton, New Jersey, for the 1948-1949 academic year. 

Mr. Arnold King, formerly Director of Research in Statistical Methodology 
for the Bureau of Agricultural Economics at Iowa State College, was appointed 
Managing Director of National Analysts, Inc., Philadelphia on July 1, 1948. 

Mr. Charles L. Marks has resigned his position as instructor of mathematics 
at the University of North Carolina to accept a teaching appointment in the 
Department of Statistics, The George Washington University, Washington 6, 
D.C. 
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Miss Doris Newman has accepted an appointment at the U.S. Naval Medical 
Research Laboratory, U.S. Naval Submarine Base, New London, Conn. 

Dr. Ernest Rubin has been transferred from the Immigration and Naturaliza- 
tion Service, General Research Section, Washington, D. C. to the European 
Branch, Areas Division, Office of International Trade in the Department of 
Commerce as an Economic Statistician. 

Mr. David Rubinstein has been promoted from Junior Research Assistant in 
the Statistical Laboratory, University of California, Berkeley, to a Teaching 
Assistant. 

Miss Elizabeth L. Scott, formerly an Associate and Research Assistant in the 
Statistical Laboratory, University of California, Berkeley, has been promoted to 
Lecturer and Research Assistant. 

Dr. Gobind R. Seth, who was formerly a student at Columbia University, has 
accepted an associate professorship in statistics at the Statistical Laboratory, 
Iowa State College. 

Dr. Charles M. Stein has been promoted to an assistant professorship in the 
Statistical Laboratory, University of California, Berkeley. 

Professor Gerhard Tintner is on leave of absence for one year from the Iowa 
State College to join the Department of Applied Economics at Cambridge Uni- 
versity, Cambridge, England as a Research Associate. 

Mr. L. H. C. Tippett, Chief Statistician of the British Cotton Industry Re- 
search Association, delivered twelve one-hour lectures on Statistical Quality Con- 
trol and Industrial Experimentation at a conference at the Massachusetts Institute 
of Technology, May 5-14, before a large audience. Dr. W. A. Shewhart of the 
sell Telephone Laboratories addressed a large audience on the Future of Statistics 
in Industrial Research and Quality Control on May 14 at the same conference. 


(ee a a 


Scientists and Reserve Officers 


The Department of the Army has established a program of particular interest 
to statisticians and other scientists who hold Reserve commissions in the Army, 
and who are professionally engaged in teaching or research and development. 

The objectives of the program are to: 

(1) maintain the useful affiliation of statisticians and other scientists with the 

Organized Reserve Corps, 

(2) provide peacetime Reserve assignments for these officers, enabling op- 

timum utilization of their education, experience and skills, 

(3) furnish mobilization assignments which will fully utilize their talents, and 

(4) adequately prepare these officers for mobilization. 

The Technical Services of the Department of the Army submit to these Re- 
search and Development Reserve Groups research problems and projects which 
pose an intellectual challenge to members of the group. Thus, the program 
provides members of each group a type of training which is in keeping with their 
scientific and technical interests and competence, rather than a traditional 
kind of training session in which scientists have little or no interest. 
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The program is now being implemented only in those areas where there is a 
definite local interest. To date, eighteen Research and Development Reserve 
groups have been organized. Twelve additional groups are in process of organi- 
zation. Others are in the initial stages of formation. Several of these groups 
have been formed in communities in which large universities, industrial research 
laboratories, or private research foundations are located. Typical localities are 
Chicago, Illinois; Wilmington, Delaware; Newark, New Jersey; Houston, Texas; 
Washington, D. C.; Manhattan and Lawrence, Kansas; Champaign-Urbana, 
Illinois; Pittsburgh, Pennsylvania; Denver, Colorado; and Detroit, Michigan. 

Provision is made to submit research projects of interest to all categories of 
scientists—chemists, physicists, engineers, geologists, geographers, psychologists, 
mathematicians, statisticians and all of the biological scientists. 

Reserve officers who are currently engaged in civilian research, college or 
university teaching, or industrial research or development, or who in the past, 
have had specific research experience are eligible to make application for assign- 
ment to an Organized Reserve Research and Development Group. A group 
may be organized in any locality where there are twenty (20) or more qualified 
officer scientists who desire to participate in the program. A subgroup may be 
organized with ten (10) qualified members. 

The program is under the general direction of the Research and Development 
Group, Logistics Division, General Staff, United States Army. The entire 
program is outlined in Department of the Army Circular Number 127, dated 5 
May 1948. 


Inquiry about organization of an Organized Reserve Research and Develop- 
ment Group or about assignment to a group already organized should be made 
of the Unit Instructor, ORC, or of the Senior Army Instructor, ORC, in the 
locality in which the officer resides. In localities in which a group has already 
been organized, the Commanding Officer of the group will consider applications 
for assignment of additional officers. 


a 


New Members 
The following persons have been elecled to membership in the Institute 


(June 1 to August 15, 1948) 


Anderson, Hjalmar, Jr. (Univ. of Oregon Medical School) Student, J’'urner, Oregon. 
Banerjee, Kali Shankar, M.A. (Calcutta Univ.) Statistician, Central Sugar Cane Research 
Station, P.O. Pusa, Bihar, India. 

Bordelow, Derrill Joseph, B.S. (Louisiana State Univ.) Associate Physicist with Naval 
Ordnance Laboratory, 602 A. Street S.’., Washington, D.C. 

Cowan, David, B.S. (Tufts Univ.) Research Analyst, War Department, 89 Lewis Street, 
East Lynn, Massachusetts. 

Frederiksen, Norman, Ph.D. (Syracuse Univ.) Research Associate, Educational Testing 
Service; Associate Professor of Psychology, Princeton University, Educational Test- 
ing Service, Box 592, Princeton, N. J. 

Gehman, Harry M., Ph.D. (Univ. of Pennsylvania) Professor of Mathematics, University 
of Buffalo, 163 Winspear Avenue, Buffalo 15, New York. 
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Hofmann, John E., A.M. (Univ. of Minnesota) Senior Research Fellow, 3222 Oakland 
Street, Ames, Towa. 

Kimball, Allyn W., Jr., B.S. (Univ. of Buffalo) Research Statistician, Department of 
Biometrics, School of Aviation Medicine, Randolph Field, Texas. 

King, Edgar P., Jr., B.S. (Carnegie Institute of Technology) Teaching Assistant in Mathe- 
matics, Department of Mathematics, Carnegie Institute of Technology, Pittsburgh 
13, Pennsylvania. 

Link, Curtis K., B.S. (Univ. of Oregon) Graduate Student-Assistant, 750 W. 6th Street, 
Eugene, Oregon. 

Leider, Nathan, B.A. (College of the City of N. Y.) Mathematician P-2, 1841 Summit 
Place, N.W., Washington 9, D.C. 

Manos, Nicholas E., M.A. (Univ. of Calif.) Meteorologist and Statistician, 1424 Rhode 
[sland Avenue, N.W., Washington 6, D.C. 

Peters, Stefan, Ph.D. (Erlanjen, Germany) Lecturer at the University of California, 1207 
Peralta Avenue, Berkeley 6, California. 

Petrou, Nicholas V., M.Sc. (Harvard Univ.) Electrical Engineer, Project Engineer, West- 
inghouse Electric Corporation, 1844 Ardmore Blvd., Pittsburgh 21, Pennsylvania. 

Prakash, Aditya, M.A. (Univ. of Michigan) Student, c/o Mathematics Department, Uni- 
versity of Michigan, Ann Arbor, Michigan. 

Read, Robert R., B.S. (Oregon State College) Apprentice Iengineer, Inventory and Costs 
Division, Pacific Telephone and Telegraph Company, 3207 N.F., 30, Portland, Oregon. 

Seiden, Esther, M.A. (Vilno, Poland) Research Assistant, Statistical Laboratory, Univer- 
sity of California, 2116 Derby Street, Berkeley 5, California. 

Sodano, John J., B.S. (Queens College) Student, Mathematical Statistics, Columbia Uni- 
versity, 172-15 93rd Avenue, Jamaica 3, New York. 

Stillinger, Richard C., M.S., (Univ. of Michigan) Graduate Student, 1368 Weston Court, 
Willow Run, Michigan. 

Swan, Albert W., B.A.Sc. (Univ. of Toronto) Statistical Section Research and Develop- 
ment Department, The United Steel Company Limited, c/o The United Steel Com- 
panies Ltd., 17 Westbourne Road, Sheffield 10, England. 

Tate, Robert F., A.B. (Univ. of Calif.) Teaching Fellow, Department of Mathematical 
Statistics, Phillips Hal!, Chapel Hill, North Carolina. 

Teichroew, Dan, B.A. (Univ. of Toronto) Division of Research, Department of Lands and 
Forest, South Baymouth, Ontario, Canada. 

Tyler, Leona E., Ph.D. (Univ. of Minnesota) Associate Professor of Psychology, Depart- 
ment of Psychology, University of Oregon, Eugene, Oregon. 
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ADOPTION OF THE NEW CONSTITUTION 


The chief order of business at the business meeting of the Institute held at 
Madison, Wisconsin on September 10, 1948, was the adoption of the new Consti- 
tution. The draft mailed to the members in August, 1948, was adopted unani- 
mously after two changes had been made. They were: (1) the insertion of the 
word ‘‘Article” before each of the respective articles and (2) the elimination of 
the first “the” in the third line and fourth paragraph of Article 4. 

Other business transacted at the meeting included a report of the Secretary- 
Treasurer on the financial condition of the Institute indicating that while the 
Institute is just operating within its income during 1948, steps will have to be 
taken to provide the additional revenue needed for 1949. It was decided not to 
raise dues for 1949 but to attempt to raise additional funds by: (1) an immediate 
appeal to universities and other institutions which are sponsoring research in 
mathematical statistics for contributions to the Institute and (2) an appeal to 
the members of the Institute to make additional contributions at the time of the 
payment of their annual dues. 

Other matters under consideration at the meeting included a reading and dis- 
cussion of a proposed revision of the By Laws, the announcement of the dates 
and locations of future meetings of the Institute and the passing of a resolution 
of thanks to those contributing to the success of the Madison meeting. 

A copy of the official minutes of this meeting may be obtained on request from 
the Secretary-Treasurer. 

P. S. Dwyer 
Secretary-Treasurer 





REPORT ON THE MADISON MEETING OF THE INSTITUTE 


The Eleventh Summer Meeting of the Institute of Mathematical Statistics 
was held at the University of Wisconsin, Madison, Wisconsin, Tuesday, Sep- 
tember 7 through Friday, September 10, 1948. The meeting was held in con- 
junction with the summer meetings of the American Mathematical Society, the 
Mathematical Association of America and the Econometric Society. The follow- 
ing eighty members of the Institute attended the meeting: 


C.B. Allendoerfer, V. L. Anderson, K. J. Arnold, H. M. Bacon, A.S. Barr, Walter Bart ky’ 
H. P. Beard, A. A. Bennett, T. A. Bickerstaff, J. H. Bushey, Maria Castellani, Uttam Chand’ 
Herman Chernoff, C. C. Craig, J. H. Curtiss, G. B. Dantzig, D. B. De Lury, J. L. Doob, A. M 
Dutton, P.S. Dwyer, Mrs. Daisy Edwards, Churchill Eisenhart, H. P. Evans, C.H. Fischer, 
J. E. Freund, H. M. Gehman, H. H. Germond, M. A. Girshick, Casper Goffman, P. R. Halmos, 
W. G. Hart, E. H.C. Hildebrandt, Wassily Hoeffding, D. G. Horvitz, Harold Hotelling, A.S. 
Householder, M. H. Ingraham, Leo Katz, Oscar Kempthorne, J. F. Kenney, W. M. Kincaid, 
T. C. Koopmans, H. D. Larsen, Walter Leighton, H. B. Mann, A. M. Mark, Jacob Marschak, 
A. W. Marshall, Kenneth May, M. R. Mickey, Jr., Dorothy J. Morrow, C. J. Nesbitt, M. J. 
Netzorg, John von Neumann, Jerzy Neyman, G. B. Price, C. J. Rees, J. S. Rhodes, P. R. 
Rider, F. D. Rigby, Herman Rubin, Arthur Sard, Henry Scheffé, E. D. Schell, I. E. Segal, 
G. R. Seth, W. B. Simpson, Andrew Sobezyk, E. W. Stacy, C. M. Stein, A. G. Swanson, 
Zenon Szatrowski, R. M. Thrall, A. W. Tucker, J. W. Tukey, W. A. Wallis, J. E. Walsh, 
J. E. Wilkins, Jr., 8S. S. Wilks, M. A. Woodbury. 


The Tuesday morning session was devoted to contributed papers. Professor 
K. J. Arnold of the University of Wisconsin presided. The attendance was 
approximately forty. The following papers were presented : 


1. On Distribution-free Confidence Intervals. Preliminary Report. 
Dr. Wassily Hoeffding, Institute of Statistics, University of North Carolina. 
. On Certain Statistics for Samples of 3 from a Normal Population. 
Mr. Julius Lieblein, statistical engineering Laboratory, National Bureau of Stand- 
ards. Presented by Dr. Churchill Eisenhart. 
3. On Multinomial Distributions with Limited Freedom: A Stochastic Genesis of Pareto’s 
and Pearson’s Curves. 


to 


Professor Maria Castellani, University of Kansas City. 

4. Fitting Generalized Truncated Normal Distributions. 
Professor Harold Hotelling, Institute of Statistics, University of North Carolina. 

5. On the Distribution of the Two Closest Observations Among a Set of Three Independent 
Observations. é 
Professor G. R. Seth, Statistical Laboratory, lowa State College. 

6. The Derivation of Cerlain Recurrence Formulae and their Application to the Extension 
of Existing Published Incompleie Beta Function Tables. 
Dr. T. A. Bancroft, Alabama Polytechnic Institute. (Presented by title.) 


On Tuesday afternoon a session for contributed papers was held jointly with 
the American Mathematical Society. Professor P. S. Dwyer of the University 
of Michigan presided. The attendance was approximately eighty. The follow- 
ing papers were presented : 
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. Asymptotic Studentization in Testing Hypothesis. 
Dr. Herman Chernoff, Cowles Commission, University of Chicago. 

8. Completeness, Similar Regions and Unbiased Estimation. Preliminary Report. 
Professor E. L. Lehman, University of California and Professor Henry Sheffé, Uni- 
versity of California at Los Angeles. 

9. On a Proposed Method for Estimating Populations. 

Professor C. C. Craig, University of Michigan. 

10. Some Results on the Asymptotic Distribution of Maximum- and Quasi-mazimum-likeli- 
hood Estimates. 

Dr. Herman Rubin, Institute for Advanced Study. 

11. The Probability Points of the Distribution of the Median in Random Samples from any 
Continuous Population. 

Dr. Churchill Eisenhart, Lola S. Deming and Celia S. Martin, Statistical Engineering 
Laboratory, National Bureau of Standards. 

12. On the Arithmetic Mean and the Median in Small Samples from the Normal and Certain 
Non-normal Populations. 

Dr. Churchill Eisenhart, Lola S. Deming and Celia 8S. Martin, Statistical Engineering 
Laboratory, National Bureau of Standards. 

13. The Relative Frequencies with which Certain Estimators of the Standard Deviation of a 
Normal Population Tend to Underestimate Its Value. 

Dr. Churchill Eisenhart and Celia S. Martin, Statistical Engineering Laboratory, 
National Bureau of Standards. 

14. Some Non-parametric Tests of Whether the Largest Observations of a Set are too Large. 
Preliminary Report. 

Dr. J. E. Walsh, Project Rand, Santa Monica, California. 

15. On Some Bounded Significance Level Properties of the Equaltail Sign Test for the 
Mean. 

Dr. J. E. Walsh, Project Rand, Santa Monica, California. (Presented by title.) 
16. Infinitely Divisible Distribuitons. 
Professor Will Feller, Cornell University. (Presented by title.) 
17. Fluctuation Theory of Recurrent Events. 
Professor Will Feller, Cornell University. (Presented by title.) 
18. Formulae for the Percentage Points of the Distributions of the Arithmetic Mean in 
Random Samples from Certain Symmetrical Universes. 


Mr. Uttam Chand, University of North Carolina and National Bureau of Standards. 
(Presented by title.) 


Abstracts of the contributed papers appear elsewhere in this issue of the Annals. 

On Wednesday morning the Institute and the Econometric Society held a joint 
session on Stochastic Processes with Professor Harold Hotelling of the University 
of North Carolina presiding. Attendance was approximately ninety. Professor 
Hotelling presented an Historical Summary of the Problem. Professor J. L. Doob 
of the University of Illinois presented a paper, Stochastic Differences Equations 
and Stochastic Differential Equations. Professor Subrahmanyan Chandrasekhar 
of the University of Chicago presented a paper, Brownian Motion, Dynamical 
Friction and Stellar Dynamics. 

The three joint sessions of the Institute and the Econometric Society on Thurs- 
day were devoted to a Symposium on the Theory of Games. The maximum 
attendance was approximately three hundred. The first morning session was 
held under the chairmanship of Professor S. 8. Wilks of Princeton University. 
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Professor John von Neumann of the Institute for Advanced Study presented a 
paper, Survey of the Theory of Games. ' Professor Oskar Morgenstern of Princeton 
University presented a paper, Economics and the Theory of Games. Dr. M. A. 
Girshick of Project Rand presented a paper, Statistics and the Theory of Games. 
The second morning session was under the chairmanship of Professor John von 
Neumann of the Institute for Advanced Study. Dr. E. W. Paxson of Project, 
Rand presented a paper, Recent Developments. Professor J. W. Tukey of Prince- 
ton University presented a paper, A Problem in Strategy. Dr. G. B. Dantzig of 
the Army Air Forces presented a paper, Programming in a Linear Structure. The 
final session of the symposium was a round table discussion with Professor Jobn 
von Neumann of the Institute for Advanced Study as chairman and with the 
following participants: Dr. G. B. Dantzig, Dr. M. A. Girshick, Professor Harold 
Hotelling, Professor Irving Kaplansky, Professor Samuel Karlin, Dr. J. C. C. 
McKinsey, Professor Oskar Morgenstern, Dr. E. W. Paxson, Dr. L. 8. Shapley, 
and Professor J. W. Tukey. 

A membership business meeting was held on Friday, September 10, in Bascom 
Hall at which twenty-one members were present. An account of the business 
transacted at this meeting may be found elsewhere in this issue under the heading 
‘Adoption of a New Constitution.”’ 

The final session was on Sequential Estimation and was held jointly with the 
Econometric Society on Friday morning with Professor Jerzy Neymen of the 
University of California presiding. Attendance was approximately fifty. Pro- 
fessor Charles Stein of the University of California presented a paper on Sequen- 
tial Estimation. Professor W. A. Wallis of the University of Chicago presented 
a discussion. 

Social affairs during the meeting included a tea Tuesday afternoon, a concert 
of the Pro Arte String Quartet Tuesday evening, a dinner Wednesday evening, 
a picnic Thursday afternoon, and a beer party Thursday evening. 

K. J. ARNOLD 
Assistant Secretary 
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